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Executive Summary 



The Hkmentaiy and Secondary Education Act (ESEA), as amended by the No Child Eeft behind Act of 2001 
inaugurated important changes in assessment and accountability for English Learner (EL) students. 
Specifically, Title III of the law required states to develop or adopt English-language proficiency (ELP) 
standards aligned with language demands of academic content standards. An annually administered ELP 
assessment based on those standards was also required by the ESEA (NCEB 2002). Tide III also 
instituted new accountability requirements for districts and states. These new EL accountability 
provisions required states to define criteria for progress in learning English, establish a performance 
standard for English proficiency, and set annually increasing performance targets for the number and 
percentage of ELs meeting these criteria. 

As has been well documented, the new law’s requirements exceeded the technical capacity of many states 
and districts to comply with it (Abedi 2004; Government Accountability Office 2006). Over the past 
several years, empirical research with more rigorous ELP assessments, systematic technical assistance 
efforts, and federal guidance have helped to reduce confusion and increase coherence in state Title III 
accountability systems (Abedi 2007; Linquanti and George 2007; Cook et al. 2008; Federal Register 
2008). Nevertheless, a significant need remains to develop the capacity of state and technical assistance 
providers to utiU 2 e empirical data for performance standard setting and accountability policy 
development in these areas. Even as ESEA reauthorization draws closer, prevailing civil rights laws and 
sustained focus on improving EL education suggest these policy-making needs will grow in importance, 
particularly given broad adoption of Common Core State Standards and establishment of related 
multistate academic and ELP assessment consortia. 

This document is intended to contribute to that capacity development by describing and illustrating 
several empirical methods and conceptual or theoretical rationales to help state policy-makers, standard- 
setting panels, and the technical advisory panels and assistance providers to (1) determine a meaningful 
ELP performance standard; (2) establish a realistic, empirically anchored time frame for attaining a given 
ELP performance standard; and (3) take into account an EL’s ELP level when setting academic progress 
and proficiency expectations. This is by design a technical document intended to assist those charged with 
providing empirical information germane to developing or revising EL accountability models, using ELP 
and academic assessments. 

This volume does not focus on several additional basic issues around EL student achievement because 
there is a companion volume (Taylor, Chinen, et al., forthcoming) that analyzes similar state- and district- 
provided student-level longitudinal achievement data and descriptively addresses those issues. That 
companion volume describes (1) the heterogeneity of the EL population and the different achievement 
statuses and trajectories of ELs with different characteristics; (2) the estimated achievement gaps among 
ELs, former- ELs and non-ELs; (3) a basic description of the typical time frame for attaining English 
language proficiency; and(4) the nature of the relationship between assessment scores measuring 
language acquisition and academic-content-area learning. The chapters of this current volume represent 
logical extensions building on those more basic descriptive analyses. 

Chapter I positions data-analysis methods illustrated in the report within a larger deliberative process of 
setting meaningful, ambitious, and realistic performance standards and accountability criteria for EL 
students. The chapter offers guidelines for enacting best practices in standard setting, and highlights 
limitations in using empirical data. 
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Chapter II illustrates three methods (decision consistency, logistic regression, and descriptive box plots) 
for analy2ing empirical data to assist policymakers in determining an ELP performance standard for 
English Learners. These methods are used in conjunction in three states, and the results — which vary in 
their degree of convergence within each state — are interpreted to show how they might be utiU2ed to 
support each state’s decision-making process. 

Chapter III illustrates methods for conducting empirical analyses to inform setting expected time frames 
for EL students to attain the ELP performance standard. Specifically, a descriptive approach and an 
event history approach are applied with various adjustments. Results are compared for EL students in 
different grade clusters with different levels of initial English language proficiency. As students’ initial 
ELP level influences the expected time frame for their attaining the English-proficient criterion, these 
data are used to illustrate the ways in which more refined time-to-EngUsh-proficiency criteria could be 
derived. 

Finally, Chapter IV discusses two methods for taking into account an EL’s ELP level when setting 
academic progress and proficiency expectations, and one method that expUcidy ignores it. First, 
progressive benchmarking methods are illustrated that adjust either EL students’ content achievement 
scale scores or their weight (individual “count”), based on each student’s ELP level relative to their initial 
ELP level and time in the state school system. Second, an indexed progress method utili2es ELs’ ELP 
growth as a proxy for English language arts performance on a weighted, time-sensitive basis for more 
newly-arrived ELs who enter the state’s school system at lower initial ELP levels. Third, a status and 
growth accountability matrix method credits both a predetermined level of student academic growth as 
well as attainment of academic proficiency, without considering an English Learner’s ELP level. Each 
method is carefully described and applied using the same education agency’s sample data set. 

AU the approaches presented in this document — many of which have been employed by the principal 
authors in working with states on their EL accountability systems — are intended to stimulate discussion 
and further exploration of additional methods among state data analysts, technical assistance providers, 
and researchers. The ultimate goal is to support the development and regular use of empirical methods 
that inform ambitious, realistic, and meaningful performance standards and accountability policies, which 
will foster EL students’ linguistic and academic progress and attainment. 
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I. Introduction 



The Hkmentaiy and Secondary Education Act (ESEA), as reauthorized by the No Child Eeft Behind Act of 
2001 , inaugurated important changes in assessment and accountability for English Learner (EL) students. 
Specifically, Tide III of the Elementay and Seconday Education Act (ESEA) required states to develop 
EngUsh-language-proficiency (ELP) standards aligned with language demands of academic content 
standards, and an ELP assessment based on those ELP standards that would measure English Learners’ 
progress in developing the language needed to attain academic proficiency (Public Law 107-110). 
Moreover, Tide I and Tide III of the amended ESEA. required states to assess each EL’s ELP annually. 
Tide III also required states to define criteria for progress in learning English, establish a performance 
standard for the English-proficient level and set annually increasing performance targets for the number 
and percentage of ELs meedng these criteria. Specifically, states were required by 2003 to set annual 
measurable achievement objecdves (AMAOs) for the percentage of ELs in each Tide III— funded local 
education agency making progress (AMAO 1), attaining the English-proficient level (AMAO 2), and 
attaining the adequate yearly progress (AYP) targets in English reading or language arts (ELA) and 
mathemadcs, as required of the EL subgroup for school districts under Title I (AMAO 3). 

As has been well documented, the law’s requirements exceeded the technical capacity of many states and 
districts to comply with it (Abedi 2004; Government Accountability Office 2006). Most states lacked 
empirical data — and indeed, even standards-based ELP assessments — with which to determine progress 
criteria and performance targets. The majority of states inidaUy used off-the-shelf ELP assessments and 
set arbitrary, often unreaUsdc performance criteria and targets for AMAOs 1 and 2. For example, some 
states expected ELs to progress equally in every domain annually, while others judged progress as 
advancing in any language domain, or used mean scale score gains of the entire cohort. One state set the 
highest language proficiency level as its expected performance standard for all ELs; another impUcidy 
allowed ELs under AMAO 1 to progress for 10 years, but required them under AMAO 2 to attain 
English proficiency within five years. Many states assumed that AMAO 1 and 2 targets needed to reach 
100 percent by 2014, as with Title I AYP targets. 

Over the past several years, empirical research with more rigorous ELP assessments, systematic technical 
assistance efforts, and federal guidance have each helped to reduce confusion and increase coherence in 
state Tide III accountability systems (Abedi 2007; Linquand and George 2007; Cook et al. 2008; Federal 
Register 2008). These efforts notwithstanding, there is sdll a significant need to develop the capacity of 
state and technical assistance providers to udHze empirical data for performance standard setting and 
accountability policy development in these areas. The current document seeks to contribute to that 
capacity development. 

Specifically, this document offers several empirical methods that state policy-making authorides can use 
as part of a larger deUberadve process to set ELP performance standards and operadonaHze ELP 
assessment and accountability criteria. This document describes and illustrates several empirical methods 
and conceptual or theoredcal radonales to help state policymakers, standard- setdng panels, and the 
technical advisory panels and assistance providers suppordng them to 

— Determine a meaningful ELP performance standard. 

— Establish a reaHsdc, empirically anchored dme frame for attaining a given ELP performance 
standard. 
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— Take into account an EL’s ELP level and time in the school system when setting academic 
progress toward proficiency expectations. 

This is bj design a technical document intended to inform those charged with implementing a larger 
policy-making judgmental process. The intended audience is state assessment and accountability 
directors with responsibility for overseeing state performance standard setting, and establishing progress 
criteria and performance target structures. Secondary audiences include technically inclined Tide I and 
Tide III state program directors, senior state educadon agency leaders, technical assistance providers, and 
those direcdy advising governance boards and state boards of educadon. 

This volume, also by design, does not focus on several addidonal basic issues around EL student 
achievement because there is a companion volume (Taylor, Chinen, et al., forthcoming) that analy 2 es 
similar state- and district-provided student-level longitudinal achievement data and descripdvely 
addresses those issues. That companion volume describes (1) the heterogeneity of the EL populadon 
and the different achievement statuses and trajectories of ELs with different characterisdcs; (2) the 
esdmated achievement gaps among ELs, former-ELs and non-ELs; (3) a basic descripdon of the typical 
dme frame for attaining English language proficiency; and (4) the nature of the reladonship between 
assessment scores measuring language acquisidon and academic-content-area learning. The chapters of 
this current volume represent logical extensions building on those more basic descripdve analyses. 

Using Multiple Methods and Empirical Data to Set Performance 
Standards 

The data methods illustrated in this report offer tools for states to use as part of a larger policy process 
of setdng meaningful, ambidous, and reaUsdc performance standards for EL students. The tesdng and 
measurement field has developed a substandal literature around methods, procedures and protocols for 
appropriately setdng performance standards (e.g., see Hambleton and Pitoniak 2006 for a review of 
these). 

Establishing performance standards is not of concern just to tesdng and measurement professionals; it is 
also of interest to those who establish criteria for educadonal accountability. Educadonal accountability 
requirements are like performance standards, but instead of being for individual students, they are for 
educational enddes such as schools, districts or states. When states established accountability models 
under the requirements of the No Child Ceft Behind Act of 200 1 , they employed many common 
performance- standard- setdng procedures. In fact, federal guidance requires that states employ common 
standard- setdng procedures when developing ESEA Tide I accountability systems. ^ The following 
paragraphs briefly review common standard- setdng pracdces and discuss the ways in which they might 
be used to support establishing the accountability requirements for Tide III (i.e., AMAOs 1 and 2). 

Hambleton and Pitoniak (2006, 435), in their chapter on setdng performance standards in Educational 
Measurement, 4th Edition, note, “[Tjhe setdng of performance standards is a blend of judgment, 
psychometrics and pracdcaUty.” The Standards for Educational and Psychological Testing (American 
Educadonal Research Associadon, American Psychological Associadon and Nadonal Council of 
Measurement in Educadon 1999, 54) also state, “|D]etermining cut scores . . . cannot be purely a 
technical matter, although empirical studies and stadstical models can be of great value in informing the 



1. See U.S. Department of Education, Office of Elementary and Secondary Education. Qan. 12, 2009). Standards 
and Assessments Peer Review Guidance: Information and Examples for Meeting Requirements of the No Child Left 
Behind Act of 2001 . Downloaded at http:/ / www2.ed.gov/ policy/ elsec/ guid/ saaprguidance.pdf, p.27 (Critical Element 
2 . 6 ). 
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process.” Given high-stakes consequences that flow from performance standards, assessment and 
accountability experts have warned that outcomes of performance-standard-setting processes can vary by 
the way in which judges are selected and trained, and by methods used to set standards (Linn 2003; 
Haertel, 2002; Haertel 2008). Thus, a foundational principle to consider when establishing standards (or 
accountability models) is that standard setting is not merely an empirical procedure; rather, it requires 
both empirical information germane to the intended uses of the performance standard and informed 
judges engaged in a rational, coherent and transparent deliberative process. These requirements cannot 
be overemphasi 2 ed in setting expectations for EL performance, given the complex interrelationship of 
second-language progress and proficiency to academic-content-area progress and proficiency. 

Hambleton and Pitoniak (2006, 436) also note, “[Sjometimes it is suggested that two or even 
three [standard-setting] methods be implemented so that the results can be compared, but this is usually 
an expensive and time-consuming process. ...” That is, if feasible, more than one method should be 
employed. Another foundational principle to consider when establishing standards is to use several 
approaches or methods to the extent practicable. Multiple approaches provide more information for the 
judgment process. Akin to Hambleton and Pitoniak’s point, the Standards for Educational and Psychological 
Testing (AERA, APA and NCME, 1999, 146) standard 13.7, states. 

In educational settings, a decision or characterization that will have a major impact on a 
student should not be made on the basis of a single test score. Other relevant 
information should be taken into account if it will enhance the overall validity of the 
decision. 

The notion of using more than a single data point as a basis for important decisions about students is 
also applicable to teachers, schools and districts. Additional empirical evidence can be used to support 
important assessment and accountability policy decisions, such as establishing performance criteria and 
reasonable targets for AMAOs. 

Hambleton and Pitoniak provide further guidance via a list of procedures to follow in order to develop 
reasonable performance standards. They write, (436) “[T]he defensibility of the resulting performance 
standard is considerably increased if the process reflects careful attention to: (1) the selection of method; 
(2) the selection and training of panelists; (3) the sequence of activities in the process; (4) validation; and 
(5) careful documentation of the process.” The authors argue that these procedures will support the 
defensibility (acceptability) of newly established performance standards. 

An appropriate standard-setting method (or multiple methods) should be used to support establishing 
standards (accountability criteria). Panelists (or judges) should be selected and trained. Regarding 
panelists. Standard 1 .7 of the Standards for Educational and Psychological Testing requires that they have 
appropriate qualifications (i.e., experiences to support setting standards, lack of vested interest in 
standards set, and relevant training). A clear process for establishing standards should also be developed 
and implemented. A mechanism should be in place to validate the decisions made by established criteria, 
and the entire process and decisions made should be carefully documented and reported. 

Applied to Tide III AMAOs, the following guidelines, derived from best practices in 
performance-standard setting, can productively support establishing reasonable ELP performance 
criteria and, as argued here, strong accountability policies: 

— To the extent practicable, employ a variety of analytic methods and procedures when establishing 
AMAO criteria. 
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— Select analytic methods that appropriately inform the decisions to be made on the basis of 
intended uses of the performance standards; this includes the generation of outcome data to 
model Ukely outcomes based on current student performance. 

— When establishing AMAOs, use both empirical data analyses and expert judges in a weU-defined 
deliberative process. 

— Select panelists who are sufficiendy qualified (e.g., expert and experienced with ELs) to support 
setting AMAO -related criteria. 

— Train panelists adequately. In that training, panelists should understand that different analytic 
methods will likely yield different results. It is an essential part of their charge to utili 2 e this 
information and their expert judgment to make the best possible decision. 

— Clearly define and follow a rational and coherent sequential process for establishing AMAOs. 

— Document the process clearly. This documentation should provide relevant stakeholders 
information about how AMAOs were set; who panelists were and why and how they were 
selected; what analyses and procedures were used; what options and scenarios were considered; 
what deliberations occurred, and what final recommendations were made. 

— Design validity studies on newly established AMAO criteria. Such research is necessary for 
examining outcomes (consequences) relative to the intent of criteria. 

Adherence to these guidelines will support more realistic and productive decisions on ELP performance 
criteria and AMAO accountability policies. As Mehrens and Ci 2 ek (2001, 484) note. 

In one way or another, setting performance standards is unavoidable. Categorical 
decisions will be made. These decisions can be made capriciously or they can be 
accomplished using the sound procedures at hand today, or via the almost assuredly 
better methods that continue to be the product of psychometric research and 
development. 

In recent years, many states have revisited and revised their ELP assessments, ELP performance 
standards, and AMAO accountability provisions. While it is beyond the scope of this document to fully 
illustrate the guidelines delineated above, the material that follows will provide several analytic methods 
applied to authentic state and district data sets. These worked examples illustrate ways to support 
establishing rigorous and realistic ELP performance criteria and meaningful AMAOs. 

Limitations to Empiricai Approaches 

As this document illustrates several empirical methods to support policy decision making, it is important 
to note the limitations of these methods and the value of using several appropriate methods in 
conjunction, when possible. No one empirical method gives a complete picture. As Box and Draper 
famously note regarding statistical models (1987, 74), “Essentially, all models are wrong, but some are 
useful.” Statistical models are imperfect representations. The methods used to support decision making 
seldom provide definitive answers. Nonetheless, they do provide valuable evidence to inform 
deliberations and decision making and to understand the potential consequences of decisions. 

In a related vein, data are almost always imperfect. For example, as will be seen in the following chapters, 
data sets used for AMAO analyses often have missing cases, and reasons for missing data may not be 
apparent. Furthermore, some data patterns are influenced by the way the EL construct is defined and 
operationali 2 ed. Two key implications of the latter phenomenon merit attention here: 
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First, EL status is intended to be temporary and to change as a direct result of high-quality language- 
instmction educational services. Therefore, more successful ELs exit language instructional programs as 
they reach required levels of ELP (and often, academic performance). A natural consequence of this fact 
is that faster progressing, higher attaining students exit the EL category sooner than slower growing 
students. 

Second, students who reach English proficiency no longer participate in the state language-proficiency 
assessments. As a result, when looking at longer term, longitudinal results for current ELs, the observed 
growth is likely an underestimate relative to that of the total cohort of students who entered the system 
needing to learn English as a second language, because it contains only the results of those EL students 
who continued not to meet exit criteria. 

It is therefore important for panelists to understand these kinds of limitations in all analytic methods and 
data utili2ed in pokey-setting deliberations. The examples provided in the following chapters attempt to 
illustrate how several empirical methods can be used in combination with actual state and school district 
data, and how the limitations of both methods and data can be highlighted for decision makers that 
utiU2e them. 
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II. Determining an English-Language-Proficient 

Performance Standard 



Overview 

This chapter illustrates how empirical data might be analyzed to assist policymakers in determining an 
EngUsh-language-proficient (ELP) performance standard for English Learners (EL). Specifically, the 
chapter presents three methods for analyzing data related to this Issue, applies these methods in 
three states, and discusses how the results might be interpreted and utilized to support decision making. 
As the analyses to come Illustrate, state context matters greatly. Applying the methods In states with 
different EL populations and different ELP and academic assessments and cut scores will yield different 
degrees of convergence across the methods, and results need to be interpreted carefully within this 
context. However, data from these methods can ground policy deliberations and help clarify the 
implications of different options under consideration. 

Implicit in analyses presented in this chapter are two assumptions: (1) State academic-content-area 
performance standards are generally set Independent of ELP performance standards, so empaneled 
judges must accept the current performance standards for a state’s academic assessments as 
unmodifiable; and (2) the state’s academic-content performance standards have been established 
appropriately and will lead to valid inferences of students’ academic knowledge, skills, and abilities. 
Interpretations of findings from the methods shared below hinge particularly on this second assumption. 
If content- area performance standards are set inappropriately or validity concerns exist about the 
performance standards or content assessment from which performance standards were created, the 
methods shared below will be correspondingly compromised. 

The Hkmentaiy and Secondary Education Act of 1 965 (ESEA), as amended by the No Child Eeft Behind Act of 
2001 (NCLBJ, defines a Umited-EngUsh-proficient (LEP) student^ as an elementary or secondary school 
student 



whose difficulties in speaking, reading, writing, or understanding the English language 
may be sufficient to deny the individual the ability to meet the State’s proficient level of 
achievement on State \academi(\ assessments [italics added] described in section 1111(b)(3); the 
ability to successfully achieve in classrooms where the language of instruction Is English 
or the opportunity to participate fuUy in society. §91 01 (25) (D) 

This definition implies that a key indicator of having sufficiently addressed the linguistic needs of EL 
students Is their performance on state content assessments, specifically, how able these students are to 
attain the state’s proficient performance standard on its academic content assessments. NCEB requires 
that states’ EngUsh-language-proficiency (ELP) standards and associated ELP assessments be “aligned 
with” academic content and performance standards (§31 13(b)(2)). These aspects of the federal law imply 
an expected relationship between students’ ELP and levels of academic proficiency when content Is 
assessed in English. Moreover, this relationship is reinforced in the recently announced federal enhanced 
assessment grant program for next-generation ELP assessment systems. These new ELP assessments are 
expected to “indicate whether individual [EL] students have attained the English proficiency necessary to 



2. Federal law uses “limited English proficient,” or LEP, to designate linguistic-minority students whose English- 
language skills inhibit their ability to benefit from mainstream instruction in English. In the research literature and in 
most states, the term “English Learner” (EL) is used, as it is in this report. 
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participate fully in academic instruction in English and meet or exceed college- and career-reac^ standards [italics 
added]” (Federal Register 20W, 21978). 

State policymakers, therefore, need to examine the relationship between their state’s ELP assessments 
and academic-content assessments as they determine what levels of linguistic and academic performance 
will be used to operationali 2 e a definition of EL.3 

To date, studies investigating the relationship between ELP and academic-content performance have 
found positive relationships between both assessment types. ^ These studies use either correlational or 
regression-based approaches to confirm relationships between ELP and academic-content assessments. 

This chapter illustrates how states can examine the relationship between EL performance on ELP 
assessments and academic-content assessments for the purpose of identifying where a proficient 
performance standard on the ELP assessment might be established. States are required to define this 
performance standard under Tide III of ESEA (specifically, to address the tide’s second annual 
measurable achievement objective [AMAO 2]), and several states have used empirical methods to 
explore this definition (see Linquanti and George 2007; Cook and others 2008). To do so, policymakers 
must first clarify what is meant by the term English-language proficient. 

The federal definition of limited English proficiency specified in the law suggests that, when students’ 
English-language skills are sufficient to (1) no longer he denied the ability to meet a state’s proficient 
performance standard on its academic-content assessments and (2) be able to achieve success in 
English-only classrooms, these students may be classified as fuUy English proficient and exited from 
speciaU 2 ed language and academic support services. That is, when ELs’ English proficiency no longer 
inhibits their meaningful participation on state assessments or in the classroom using English, they may 
be classified as fully English proficient. Note that the federal definition does require that ELs be 
academically proficient in order to be classified as fuUy English proficient. Clearly, many native-EngUsh- 
speaking students are also not proficient on state content assessments. These students’ lack of academic 
proficiency may not be related at all to their English-language skills. ELs, therefore, must have sufficient 
ability in academic English to meaningfully participate in the classroom and on content assessments. ^ 

Empirically, researchers can define “English language proficient” as the point at which EL students’ 
academic content achievement assessed using English becomes less related to their ELP. That is, there is 
a point at which EL students have sufficient English language skills to adequately function in English on 
content assessments; accordingly, there should be observable decreases in the relationship between the 
two assessments. At or beyond this point is where the ELP performance standard might be considered, 
and empirical procedures can help to identify this level of performance. Because academic language 
demands vary by academic content area and grade level, this performance point will likely vary as well. 
Yet state policymakers are usually required to select one ELP performance level. They, therefore, need to 
examine the data carefully to clarify tradeoffs and attempt to make an optimal decision. 



3. Although ESEA Title I specifically requires states to set a goal of 100 percent of aU students’ attaining academic 
proficiency in reading or language arts and mathematics by the 2013-14 school year, recent policy discussions regarding 
ESEA reauthorization emphasize more empirically based growth models and performance targets (see Linn 2008). 

4. See Stevens and others (2000), Butler and Castellon- Wellington (2000 and 2005), Kato and others (2004), Francis 
and Rivera (2007), Parker, Louie, and O’Dwyer (2009), Cook and others (2009), and Taylor and others (forthcoming). 

5. Correspondingly, some EL policy experts argue that the academic performance of ELs attaining ELP should be 
comparable to that of their native-English-speaking counterparts, notwithstanding the fact that EL students generally 
experience higher poverty rates and often have proportionally fewer instructional resources (Working Group on ELL 
Policy 2010). 



Determining an ELP Performance Standard 





Key Approaches 



We used three approaches to explore the relationship between ELP level and meeting the state’s grade- 
level performance standard on academic-content assessments: 

1 . Decision consistency analysis, which analyzes linguistic and academic proficiency-level 
categorizations and seeks to optimize consistent categorization of ELP students at the state’s 
preestablished academic proficient cut score. 

2. Logistic regression analysis, which estimates the probability of being proficient on academic- 
content assessments for each ELP score. ^ This approach could identify ELP scores for which 
students have a probability of equal to or greater than 50-50 (0.5) of being proficient on the 
content assessment.® 

3. Descriptive box plot analysis,® which identifies the ELP level at which at least half the 
assessed EL students are above the academic-content proficient score cut point. At this point, 
students equally distribute above and below the state’s proficient performance standard in 
academic content, which may suggest that, above this point, more than just language proficiency 
is contributing to observed scores. 

Taken together, these three approaches provide multiple sources of evidence to investigate and 
corroborate the point at which an ELP performance standard might be set. AU three approaches should 
be used, when feasible, in order to provide policymakers with more complete, possibly “triangulated” 
empirical evidence for delimiting a range of performance and defining options to establish an ELP 
performance standard for ELs. 

The following section applies these analytic methods to examine data for EL students within each of 
three very different states at different grade levels for two academic years, Specifically, the analyses 
illustrate grade 4 outcomes from Education Agency 1 for 2007—08 and 2008—09, grade 7 outcomes from 



6. As this approach is relatively new, a detailed description and step-by-step illustration of it is provided in 
Appendix A. 

7. For this approach, the outcome is a dummy indicator that equals 1 if the student is proficient on 
academic-content assessment and 0 otherwise. The predictors of the model are dummy indicators, which take the value 
of 1 if the student’s score falls within the particular ELP proficiency level and 0 otherwise. 

8. The English-proficient performance standard is conceptualized here as the point where students’ language 
proficiency becomes less related to content proficiency. A 50-50 criterion is selected because students with ELP scores 
at this level have an equal likelihood of attaining content-area-proficient performance. ELP assessments are not designed 
or intended to strongly predict content proficiency per se but rather to ascertain if students have the requisite language 
needed to meaningfully participate in acadernic-content-area learning using English. Certainly, a different criterion could 
be adopted. States should be careful, however, not to set the criterion too high. If our assumption about the 
English-proficient performance standard is correct, then ELP assessments will be less predictive of content proficiency 
at higher ELP levels. This imprecision would add greater error and might lead to inaccurate inferences. The imprecision 
might also lead to establishing higher content performance expectations for EL exit decisions than the performance 
currently attained by non-EL students. 

9. A box plot shows graphically five-number summaries: the smallest observation, lower quartile, median, upper 
quartile, and largest observation. The box plot graph may also indicate outliers observations. But for simplicity, the 
following graphs do not show outliers observations. 

10. There are significant concerns (including noncomparability) surrounding states’ selected academic-proficient 
performance standards for high-stakes accountability purposes (Ho 2008; Dietz 2010). However, the present study 
assumes that the sample states’ academic-proficient performance standards are rigorous and defensible. 

11. Results from both years’ analyses were very similar within each state; therefore, only the more recent year is 
discussed here, and aU results for both years analyzed are provided in appendixes B-D. 
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Education Agency 2 for 2007—08 and 2008—09, and grade 10 outcomes from Education Agency 3 for 
2009-10 and 2010-11. 

Each “worked example” is illustrated in turn, employing aU three methods to explore ranges of EL 
performance on the ELP assessment and the relationship of these ranges to content-area assessment 
performance. The purpose of doing this is to provide input from multiple methods to inform policy 
discussions on determining an acceptable ELP performance standard. 

State results are shown to illustrate how the different methods are used together to examine relationships 
between assessment results within a state, and how policymakers can interpret these results to establish 
options. Keaders should not compare outcomes across states: Each state has different content and ELP 
assessments based on state-specific content and performance standards, and each state administers its 
assessments at different times and utili 2 es different scaling and equating methodologies. Thus, drawing 
comparisons and inferences across states would be inappropriate and misleading. 

Example 1. Education Agency 1 

This first example shows results from Education Agency 1. Given in grades 2 through 11, the state’s 
academic content test assesses English or language arts (ELA) and mathematics, and is administered in 
the spring of each school year. It provides five content performance standard categories: Far Below 
Basic, Below Basic, Basic, Proficient, and Advanced. The state ELP assessment is administered from 
mid-summer through early fall of each school year. This assessment provides five ELP performance 
levels: Beginning, Early Intermediate, Intermediate, Early Advanced, and Advanced. 12 

Method A. Decision Consistency Anaiysis 

As explained, the decision consistency analysis (Exhibit 1 , below) illustrates the cumulative percentage of 
consistent decisions derived from ELP and academic-content-assessment classifications. Assessment 
results provided are from fourth-grade students who took both exams in the same academic year. The 
objective of this approach is to identify the ELP performance in which the maximum percentage of 
consistent decisions is found (see Exhibit 1, below). 

As described in Appendix A, the first step in a decision consistency analysis is to create comparative 
bands for more refined analysis, as performance levels on standardi 2 ed large-scale assessments can 
represent a broad range of performance. Accordingly, 1 0 comparative bands are created from the state 
ELP assessment composite proficiency scores to provide sufficient gradation for analysis: Beginning 
Low, Beginning High, Intermediate Low, Intermediate High, and so on. For each band, a decision 
consistency value is calculated, and those values are plotted in Exhibit 1 . In this example, the greatest 
percentage (78 percent) of consistent decisions for fourth-grade ELs in ELA is obtained at the Early 
Advanced Low (Early Adv Low) performance level, while for mathematics, the greatest percentage 
(66 percent) occurs in the upper half of the scale score range of the Intermediate High (Int High) 
performance band. This analysis suggests that state policymakers consider the low end of Early 
Advanced as a possible range for optimi 2 ing consistent decisions, given the state’s current academic 



12. The scale score ranges for the Beginning, Early Intermediate, and Intermediate proficiency levels of the overall 
scores for grade 4 are 230-432, 433-472, and 473-530, respectively. The ranges for the Early Advanced and Advanced 
proficiency levels are 531-574 and 575-700, in that order. 
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performance standard at this grade leveld^ Relevant grade levels can be analyzed similarly, and results 
from analyses can be aggregated and presented for deliberations. 



Exhibit 1. 

Education Agency 1, Grade 4: ELP and English or Language Arts and Mathematics 

Decision Consistency Analysis (2007-08) 




Grade 4 
-Grade 4 



ELA 

Matt- 



Exhibit reads: The percent of consistent decisions obtained for fourth-grade EL students in mathematics 
up through the High Intermediate ELP performance level is 66 percent. In English or language arts, the 
cumulative rate up through the lower Early Advanced scale score range is 78 percent. 

Notes: The composite ELP scale score ranges for the Beginning (level 1), Early Intermediate (level 2), and 

Intermediate (level 3) proficiency levels of the overall scores for grade 4 are 230-432, 433-472, and 473-530, 
respectively. The ranges for the Early Advanced (level 4) and Advanced (level 5) proficiency levels are 531-574 
and 575-700, in that order. The scale score point at the midpoint of each proficiency level range is then 
demarcated. Students below the midpoint are classified as “low.” Students at or above that point, are classified 
as “high.” Appendix Exhibits B.5 and B.6 present the number of students with ELP and English or language 
arts and mathematics proficiency level scores and the percentage of consistent decisions. 

Decision consistency formula: DC% = (QII + QIII)/ (QI + QH + QIII + QIV). 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



13. As noted above, state policymakers are usually required to select one English-language-proficient performance 
standard to apply to ELs across all grade levels, and so would identify the higher ELP performance standard to ensure 
adequate performance on ELA, as well as mathematics. 
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Method B. Logistic Regression Anaiysis 



The logistic regression (probability) curves in Exhibit 2 (below) illustrate the likelihood of scoring at or 
above the academic proficient performance standard, as currendy defined by the state for ELA and 
mathematics, respectively, as a function of increasing composite ELP scale scores. The horizontal line in 
the middle of each exhibit marks the point at which there is an equal (50-50) probability of attaining that 
academic performance standard, while the vertical lines mark the scale score cut points that distinguish 
the five ELP performance levels on the state ELP assessment. Similar to the decision consistency 
analyses, these regression analyses suggest that the state ELP assessment scale score value (545 points) 
corresponding to the lower half of the Early Advanced ELP performance level, and that corresponding 
to the midpoint of the Intermediate ELP level (501 points) would be sufficient to obtain that likelihood 
of performance in ELA and mathematics, respectively. This approach yields corroborating evidence and 
would therefore increase confidence in suggesting the lower end of Early Advanced as a range to 
consider for the English-proficient performance standard. 
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Exhibit 2. 

Education Agency 1, Grade 4: Logistic Regression Plots 
for English or Language Arts and Mathematics (2007-08) 



Grade 4, 2008 ELA Logistic Plot 



Estimated 




ELPCompositeScaleScoreOS 
Grade 4, 2008 Math Logistic Plot 



ELP performance levels 

for grade 4 (in scale score points) 

Level 1 = Beginning (230—432) 
Level 2 = Early Intermediate 
(433-472) 

Level 3 = Intermediate 
(473-530) 

Level 4 = Early Advanced 
(531-574) 

Level 5 = Advanced (575-700) 



Estimated 




Exhibit reads: There is an equal probability for grade 4 EL students who score in the lower half of the Early 
Advanced scale score range to achieve the proficient performance standard in English or language arts, 
specifically, when obtaining 545 scale score points. 

Notes: The plot represents the estimated logistic curve where the outcome is the dummy indicator English or language 

arts proficient and the predictor is the continuous ELP composite scale score. The logistic regression 
(probability) curves illustrate the likelihood of scoring at or above the academic proficient performance 
standard, as currently defined by the state for ELA and mathematics, respectively, as a function of increasing 
composite ELP scale scores. 

The vertical dashed lines correspond to minimum scale scores for Early Intermediate, Intermediate, Early 
Advanced, and Advanced proficieny levels. 

The point estimate and standard error are presented in Appendix Exhibit B.8. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Method C. Descriptive Box Piot Anaiysis 

Finally, the box plot analyses for ELA and mathematics (Exhibit 3) show the distribution of scale score 
performance in those academic subject areas, respectively, for students at each of the state ELP 
assessment’s five composite ELP performance levels. The data reveal findings congruent with the 
decision consistency and logistic regression analyses. Specifically, as seen in Exhibit 3 (below), EL 
students performing at the Early Advanced ELP performance level (or level 4) have a median ELA 
performance of 355 scale score points (just above the proficient standard in ELA), and more than 
50 percent of ELs at this ELP performance level attain this academic performance standard. For 
mathematics (Exhibit 3), a similar result (median performance of 350 scale score points) occurs for ELs 
at the Intermediate ELP performance level (or level 3), indicating that exactly half the EL students at this 
ELP performance level attained the mathematics academic performance standard. 



Exhibit 3. 

Education Agency 1, Grade 4: Box Piots of English or Language Arts 
and Mathematics Scale Scores, by ELP Performance Level (2007-08) 



Distribution of StateContentLanguageArtsScaleScS by ELP_cat08 




Distribution of StateContentMathematicsScaleScoS by ELP_cat08 
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ELP performance levels 

for grade 4 (In scale score points) 

Level 1 = Beginning (230—432) 
Level 2 = Early Intermediate 
(433-472) 

Level 3 = Intermediate 
(473-530) 

Level 4 = Early Advanced 
(531-574) 

Level 5 = Advanced (575-700) 



Exhibit reads: Grade 4 EL students performing at the Early Advanced ELP performance level (or level 4) 
have a median performance of 355 scale score points in English or language arts, five points above the 
proficient performance standard. 

Note: Appendix Exhibit B.IO presents the descriptive statistics associated with this graph. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Determining an ELP Performance Standard 



14 



For Education Agency 1, results of aU three methods converge, and suggest that policymakers might 
consider setting the ELP performance standard in the Early Advanced range of the state ELP 
assessment. Doing so would maximize the percentage of consistent decisions relative to the students’ 
ELA performance, and would also — per regression analysis — yield a 50 percent or greater probability 
that EL students attaining this level of language proficiency on the state ELP assessment would also 
attain the current proficient standard on the state’s ELA test. As corroborated by the box plots, sUghtiy 
more than 50 percent of EL students at the Early Advanced level on the ELP assessment exceed the 
proficient performance standard on the state’s ELA test. 

Performance on mathematics appears to require less ELP, as consistent decisions are maximized in the 
upper half of Intermediate on the state ELP assessment, with the logistic regression curve indicating that 
the fourth-grade EL students above the midpoint of the Intermediate level exceed the 0.50 probability of 
attaining mathematics proficiency.The box plots indicate that fuUy 50 percent of students at the 
Intermediate ELP level attain proficiency on mathematics in the fourth grade. Since policymakers must 
choose only one performance standard for the ELP level, these analyses suggest a need to consider the 
higher ELP level indicated by the ELA analyses. 

Example 2. Education Agency 2 

The state content test used for Education Agency 2 is administered in grades 3 through 8 in the areas of 
literacy and mathematics in the spring of each school year. These assessments provide four performance 
categories: Below Basic, Basic, Proficient, and Advanced. Education Agency 2 is one of a consortium of 
states that shares an English language-proficiency assessment. This assessment is administered in the late 
spring of each school year and provides five ELP performance levels: Pre-Functional, Beginning, 
Intermediate, Advanced, and FuUy English Proficient. 

Because the ELP assessment composite proficiency levels are not created from composite scale scores, 
creating more than five comparative bands of performance for decision consistency analysis is not 
possible. Therefore, the ELP assessment’s five proficiency levels are used. 

Method A. Decision Consistency Anaiysis 

Decision consistency analyses for Education Agency 2’s EL students in 2008 (Exhibit 4, below) show 
that the greatest percentage of consistent decisions for seventh-grade ELs in literacy (in the exhibit 
legend, identified as ELA) (77 percent), as well as mathematics (76 percent), is obtained at the same ELP 
performance level — namely, the Advanced level on the ELP assessment, the fourth of five performance 
levels defined on this ELP test. This finding holds for both years examined (with sUghtiy higher results — 
81 percent — for ELA occurring in 2007). In addition, the percentage of consistent decisions in 
Education Agency 2 at this grade level varies Uttie between the academic subject areas. 



14. Composite scale scores are not aligned with the composite proficiency levels. While the former is created by 
averaging results from the four domains, the latter is generated using a weighting system that combines proficiency-level 
results from two synthetic categories — comprehension (composed of listening and reading domains) and production 
(composed of speaking and writing domains). 
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Exhibit 4. 

Education Agency 2, Grade 7: ELP and Literacy/Mathematics 
Decision Consistency Anaiysis (2007-08) 




Grade 7 ELA 

—o— Grade 7 Math 



Exhibit reads: The percent of consistent decisions obtained for seventh-grade EL students in mathematics 
up through the Intermediate ELP performance level is 63 percent. In English or language arts, the 
cumulative rate up through the Advanced scale score range is 77 percent. 

Notes: ELP composite scale scores are not aligned to the composite proficiency levels in Education Agency 2. 

Although the former is created by averaging results from the four domains, the latter is generated using a 
weighting system that combines proficiency-level results from two synthetic categories — comprehension 
(composed of listening and reading domains) and production (composed of speaking and writing domains). 
Appendix Exhibits C.5 and C.6 present the number of students with ELP and English or language arts and 
mathematics proficiency-level scores and the percentage of consistent decisions. 

Decision consistency formula: DC% = (QII -I- QIII)/ (QI -I- QII -I- QIII -I- QIV). 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Method B. Logistic Regression Anaiysis 



The logistic regression (probability) curves for Education Agency 2’s seventh-grade EL students are 
displayed in Exhibit 5 (below). Given the way the ELP assessment constructs its overall composite scale 
scores, they are not used in this analysis. Instead, the ELP assessment reading scale score result is used, 
as the reading domain has been found to be the domain with highest predictive validity on some ELP 
assessments (e.g., see Parker, Louie, and O’Dwyer 2009). Dke decision consistency analyses, logistic 
regression analyses suggest that the scale score value (784 points) corresponding to the upper portion of 
the Advanced ELP reading performance level and that corresponding to the lower portion of the 
Advanced ELP reading level (734 points) are sufficient to obtain the equal likelihood of attaining the 
state's proficient performance standard in literacy and mathematics, respectively. Again, this 
corroboration, strengthens confidence in pinpointing a range of ELP performance for the ELP 
performance standard. 



15. Composite performance levels are derived from a weighted combining of two synthetic categories — 
comprehension (composed of listening and reading domains) and production (composed of speaking and writing 
domains). Because these composite performance levels do not align with the composite ELP level scale score ranges, 
logistic regression analyses could not be conducted using the ELP assessment composite scale score. 
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Exhibit 5. 

Education Agency 2, Grade 7: Logistic Regression Plots 
for Literacy and Mathematics (2007-08) 



Grade 7, 2008 ELA Logistic Plot based on ELP Reading domain 
Estimated 




ELP Reading performance levels 
for grade 7 (in scale score points) 

Level 1 = Pre functional (Below 469) 
Level 2 = Beginning (469—618) 
Level 3 = Intermediate (619-690) 
Level 4 = Advanced (691-834) 
Level 5 = FuUy English Proficient 
(835 or above) 



Grade 7. 2008 Math Logistic Plot based on ELP Reading domain 



Estimated 




ELPReadingScaleScore08 

Exhibit reads: There is an equal probability for grade 7 EL students who score in the upper portion of the 
Advanced ELP performance level in reading to achieve the proficient performance standard in English or 
language arts, specifically, when obtaining 784 scale score points. 

Notes: The plot represents the estimated logistic curve from a model in which the outcome is the dummy indicator 

English or language arts and mathematics proficiency and the predictor is the continuous ELP reading scale 
score. The logistic regression (probability) curves illustrate the likelihood of scoring at or above the academic 
proficient performance standard, as currently defined by the state for ELA and mathematics, respectively, as a 
function of increasing composite ELP scale scores. 

The vertical dashed lines correspond to minimum ELP reading scale scores for Beginning, Intermediate, 
Advanced, and Fully English Proficient levels. 

The point estimate and standard error are presented in Appendix Exhibit C.8. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Method C. Descriptive Box Piot Anaiysis 



Finally, the box plot analyses for literacy and mathematics (Exhibit 6) for Education Agency 2’s 
seventh-grade EL students reveal findings consistent with the decision consistency and logistic 
regression analyses. Specifically, as seen in Exhibit 6 (below), EL students performing at the Advanced 
composite ELP performance level (or level 4) have a median literacy performance of 681 scale score 
points (a Utde higher than the proficient standard of 673), and sUghdy more than 50 percent of ELs at 
this ELP level attain this academic-performance standard. For mathematics (Exhibit 6), EL students 
performing at the Advanced composite ELP level attain a higher median performance (703 scale score 
points) and attain academic performance standard at higher rates (nearly 75 percent of the EL students at 
the advanced ELP level do so). 
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Exhibit 6. 

Education Agency 2, Grade 7: Box Plots of Literacy 
and Mathematics Scale Scores, by ELP Performance Level (2007-08) 



Distribution of StateContentLiteracyScaleScoreOS by ELP_cat08 




Distribution of StateContentMathematicsScaleScof by ELP_cat08 




ELP composite performance levels 

Level 1 = Pre functional 

Level 2 = Beginning 

Level 3 = Intermediate 

Level 4 = Advanced 

Level 5 = Fully English Proficient 



Exhibit reads: Grade 7 EL students performing at the Advanced ELP level (or level 4) have a median 
performance of 681 scale score points in English or language arts, 8 points above the proficient performance 
standard. 



Note: ELP composite scale scores are not aligned to the composite proficiency levels. While the former is created by 

averaging results from the four domains, the latter is generated using a weighting system that combines 
proficiency-level results from two synthetic categories — comprehension (composed of listening and reading 
domains) and production (composed of speaking and writing domains). Appendix Exhibit C.IO presents the 
descriptive statistics associated with this graph. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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In Education Agency 2, the percentage of consistent decisions tends to be higher for mathematics than 
for literacy at a given ELP level in seventh grade, but the percentage of consistent decisions for ELs in 
this grade is maximized at the same ELP reading level (Advanced) on the ELP assessment for both 
academic subjects. The logistic regression curves indicate that the 0.50 probability level for attaining 
academic proficiency in mathematics versus ELA is attained at a lower ELP assessment reading scale 
score (734 scale score points, versus 784 scale score points, respectively), but these are both within the 
Advanced ELP assessment reading level. The box plots for the same students also show that, while 
sUghtiy more than 50 percent of those at the ELP assessment’s Advanced reading level score proficient 
on the state literacy test, close to 75 percent do so on the state mathematics exam. The evidence across 
these analytic methods triangulates, and suggests that policymakers should consider this range of 
performance on the ELP assessment as a starting point for discussion. 

Example 3. Education Agency 3 

Education Agency 3’s academic assessment tests students in reading and mathematics in grades 
3 through 8 and in grade 10. These assessments provide academic-proficiency performance- standard 
categories of Minimal, Basic, Proficient, and Advanced. These assessments are administered in mid-faU 
(November), to assess knowledge gained in the prior academic year. 

The ELP assessment used in Education Agency 3 is administered in the early fall and winter of each 
school year and provides six language-proficiency performance levels: Entering, Beginning, Developing, 
Expanding, Bridging, and Reaching. In addition, the ELP assessment provides composite proficiency 
scores in decimal values from 0.0 to 0.9 for each proficiency category (e.g., 3.0, 3.1, 3.2, 3.3). The 
decimals represent 1 0 equidistant scale score points between each proficiency level and the next. 

Method A. Decision Consistency Anaiysis 

To create bands for the decision consistency analysis, ELP assessment composite scores lower than 
0.5 (e.g., 4.4) are categorized in the Low group. Thus the composite proficiency score of 4.4 would be 
categorized as Expanding Low. Scores at or above 4.5 would be Expanding High. Other proficiency 
bands are created similarly. 
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Exhibit 7. 

Education Agency 3, Grade 10: ELP and Reading/Mathematics 
Decision Consistency Analysis (2009-10) 




• • • • • Grade 4 ELA 
— . — Grade 4 Matt" 



Exhibit reads: The percent of consistent decisions obtained for lOth-grade EL students in mathematics up 
through the Bridging Low ELP performance level is 75 percent. In English or language arts, the cumulative 
rate up through the Bridging Low level is 79 percent. 

Note: This state has six ELP levels: Entering, Beginning, Developing, Expanding, Bridging, and Reaching. The 

composite ELP scale score ranges for the Entering (level 1), Beginning (level 2), and Developing (level 3) 
proficiency levels of the overall scores for grade 10 are 100-332, 333-362, and 363-386, respectively. The 
ranges for the Expanding (level 4), Bridging (level 5), and Reaching (Level 6) proficiency levels are 387-404, 
405-423, and 424—600 in that order. The scale score point at the midpoint of each proficiency level range is 
then demarcated. Students below the midpoint are classified as “low.” Students at or above that point, are 
classified as “high.” Appendix Exhibits D.5 and D.6 present the number of students with ELP and reading and 
math proficiency-level scores and the percentage of consistent decisions. 

Decision consistency formula: DC% = (QII + QIII)/ (QI + QII + QIII + QIV). 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Decision consistency analyses for Education Agency 3 EL students in 2010 (Exhibit 7) show that the 
greatest percentage of consistent decisions for lOth-grade ELs in reading (79 percent), as well as 
mathematics (75 percent), is obtained at the same ELP score range — Bridging Low on the ELP 
assessment. This pattern holds for both years examined. Also, the percentage of consistent decisions in 
Education Agency 3 in this grade is similar in the two academic subjects. 

Method B. Logistic Regression Anaiysis 

The logistic regression (probability) curves for Education Agency 3’s lOth-grade EL students, displayed 
in Exhibit 8 (below), yield results that dijferhom those of the decision consistency analyses for the same 
population. Specifically, the probability curves for both reading and mathematics yield a scale score value 
corresponding to the upper portion of the Expanding (i.e.. Expanding High) composite ELP assessment 
performance level (398 and 399 scale score points, respectively), which is sufficient to obtain a SO- 
SO likelihood of proficient performance on the state’s reading and mathematics content exams. Expanding 
is an ELJ^ level lower than that predicted in the above decision consistency analyses. This kind of discrepancy requires 
careful interpretation and possibly further data analysis. 
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Exhibit 8. 

Education Agency 3, Grade 10: Logistic Regression Plots for 
Reading and Mathematics (2009-10) 

Grade 10, 2010 Reading Proficient Logistic Plot 

Estimated 
Probability 
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Grade 10, 2010 Mathematics Proficient Logistic Plot 

Estimated 
Probability 
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Exhibit reads: There is an equal probability for grade 10 EL students who score in the upper portion of the 
Expanding composite ELP performance level to achieve the proficient performance standard in English or 
language arts, specifically, when obtaining 398 scale score points. 

Note: The plot represents the estimated logistic curve from a model in which the outcome is the dummy indicator 

English or language arts and mathematics proficiency and the predictor is the continuous ELP composite scale 
score. The logistic regression (probability) curves illustrate the likelihood of scoring at or above the academic 
proficient performance standard, as currently defined by the state for ELA and mathematics, respectively, as a 
function of increasing composite ELP scale scores. 

The vertical dashed lines correspond to the cut scores between two continuous ELP performance levels such 
that a value of 333 is the point between Entering (Level 1) and Beginning (Level 2). 

The point estimate and standard error are presented in Appendix Exhibit D.8. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 





ELP composite performance levels 
for grade 10 (in scale score points) 

Level 1 = Entering (100—332) 

Level 2 = Beginning (333-362) 
Level 3 = Developing (363-386) 
Level 4 = Expanding (387-404) 
Level 5 = Bridging (405-423) 

Level 6 = Reaching (424—600) 
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Method C. Descriptive Box Piot Anaiysis 

Finally, the box plot analyses for reading and mathematics (Exhibit 9) among Education Agency 3’s 
lOth-grade EL students reveal findings consistent with the decision consistency method, although not 
with that of the logistic regression method. Specifically, as seen in Exhibit 9 (below), EL students 
performing at the Bridging composite ELP performance level (or level 5) on the ELP assessment have a 
median performance of 523 scale score points on the academic content exam (well above the proficient 
standard of 503 in reading), and about 75 percent of ELs at this Bridging performance level meet this 
academic performance standard. For mathematics (Exhibit 9), EL students performing at the Bridging 
composite ELP level on the ELP assessment attain a higher median mathematics performance (552 scale 
score points) but meet the mathematics performance standard at slightly lower rates (nearly 70 percent of 
the EL students at the Bridging ELP level do so). 



Exhibit 9. 

Education Agency 3, Grade 10: Box Piots of Reading 
and Mathematics Scaie Scores, by ELP Performance Levei (2009-10) 
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Distribution of StateContentMathematicsScaleSco3 by ELP_cat09 
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ELP composite performance levels 
for grade 10 (in scale score points) 

Level 1 = Entering (100-332) 

Level 2 = Beginning (333-362) 
Level 3 = Developing (363-386) 
Level 4 = Expanding (387-404) 
Level 5 = Bridging (405-423) 

Level 6 = Reaching (424—600) 



Exhibit reads: Grade 10 EL students performing at the Bridging composite ELP performance level 

(or level 5) have a median performance of 523 scale score points in English or language arts, 20 points higher 

than the proficient performance standard. 

Note: Appendix Exhibit D.IO presents the descriptive statistics associated with this graph. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Education Agency 3’s three analyses, therefore, yield a more complex picture. Specifically, the percentage 
of consistent decisions is maximi 2 ed for 1 0*-grade EL students at the Bridging Low ELP level on the 
ELP assessment for both reading and mathematics academic subject areas. In contrast, the logistic 
regression curves indicate that the 0.50 probability level of attaining academic proficiency in the state’s 
lOth-grade reading and mathematics tests is reached at the upper end of the scale score range defining the 
Expanding ELP level on the ELP assessment; yet less than 50 percent of EL students at the expanding 
ELP level attain academic proficiency in reading or mathematics, as seen in the box plots — in fact, only 
about 27 percent of students at the Expanding level attain academic proficiency in either reading or 
mathematics. 

In this example, how might analysts informing policymakers proceed? In this case, more careful analysis 
and interpretation of the data addresses the issue. The apparent discrepancy in the outcomes of the 
logistic regression method relative to the other two methods is resolved through more careful review of 
the empirical evidence. Specifically, the steep slopes seen in the logistic regression curves within the scale 
score range of the Expanding ELP level on the ELP assessment suggest that performance on Education 
Agency 3’s lOth-grade academic content assessments is particularly sensitive to linguistic gains in this 
range of the ELP test. Indeed, the logistic regression curves estimate that an EL student at the high end 
of Expanding is much more likely to attain academic proficiency than a student at the low end of 
Expanding. For example, for Education Agency 3’s lOth-grade reading academic assessment in 2010, the 
probability runs from 25 percent at the lower end (385 scale score points) to 62 percent at the upper end 
(405 scale score points) of the Expanding ELP level on the ELP assessment; for mathematics, the 
probability of academic proficiency runs from 32 percent to 59 percent for the same scale score point 
range of Expanding on the ELP assessment. Because the box plot analyses indicate that nearly 
75 percent of EL students at the Bridging ELP level on the ELP assessment meet the proficient 
performance standards for reading and mathematics content assessments, the preponderance of evidence 
suggests that policymakers should consider setting the ELP performance standard somewhere between 
the Expanding High and Bridging Low ELP levels on the state’s ELP assessment. Next steps would 
involve analysis of additional grades, to increase confidence in recommending a particular performance 
standard. 

Summary 

The three examples highlight how these distinct analytic methods can assist state policymakers in 
examining empirical evidence on EL student academic performance relative to ELP level, and provide 
input for policy decisions on determining an acceptable ELP performance standard. Even when utili 2 ing 
these methods in very different “worked examples,” on EL student outcomes from each of three states 
over two academic years at three different grade levels, these methods yield largely convergent results 
within each Education Agency, although results for each require different degrees of interpretation. 

In effect, these empirical methods are tools to gather empirical evidence to inform and support policy 
discussions. They are not intended to mechanically determine or constrain policy decisions. Moreover, 
for simplicity of presentation, analyses were performed on grade-level data. In a complete application, 
states would review the empirical outcomes of aU grades for which data are available. In addition, states 
could choose to aggregate these data into grade spans (e.g., 3—5, 6—8, 9—12) in order to examine 
performance patterns by elementary, middle, and high school segments, respectively. As noted 
previously, policymakers are usually constrained to choose a single EngUsh-language-proficient 
performance standard. The evidence presented above suggests that these three analytic methods used in 
conjunction can greatly assist policymakers in conducting informed discussions and making defensible 
decisions. While the three methods presented here provide useful information, they are intended to 
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stimulate reflection and do not represent aU analytic possibilities. In fact, this work should motivate more 
exploratory research in this area to generate new, more powerful analytic and visual approaches. 

Caveats 

One key caveat that policymakers should keep in mind as they review state analyses concerns the timing 
of administration of the ELP and academic content assessments. Specifically, in instances such as 
Education Agencies 1 and 3, where the ELP assessment is administered several months before the 
academic achievement assessment, EL students’ level of English language proficiency iphen they take 
the academic content assessment va. 2 y be quite different from that indicated by their ELP assessment result. 
Furthermore, the direction of this difference may vary, in part, according to the exact time of year when 
each assessment is given, and the linguistic environment of the EL student population. 

In Education Agency 1, because the state ELP assessment is given July through October and the 
academic assessment is given the following spring, one could argue that the academic achievement 
results for a given ELP level will be systematically overstated relative to performance on concurrent 
assessments, since EL students will have received almost an entire year of instruction in English as a 
second language or English language development and will likely be at a higher ELP level when they take 
the academic content assessment. In Education Agency 3, where the ELP assessment is given in late 
spring and the academic achievement assessment is administered in the following fall, one might imagine 
that the same phenomenon as that just described could result. However, if some EL students spend the 
summer months in isolated linguistic communities, with Uttie or no exposure to native English speakers 
and English texts, it is possible that this overestimation effect is canceled out, or even that an 
underestimation effect is generated. Given the complexities of interpreting these effects for different 
populations and contexts, findings from these analytic approaches will be more robust and interpretable 
to the extent that the ELP and academic content assessments are administered in closer time proximity. 

Another, related issue concerns the possible systematic exclusion of data of students tested on the ELP 
and academic assessments at different times. For example, since Education Agency 1 administers the 
state ELP assessment in the fall, some of the higher performing EL students in Education Agency 1 had 
their language classifications changed from EL to reclassified fuUy English proficient (RFEP) between 
the administration of the state ELP assessment in the fall and the state’s academic content tests in the 
following spring. Because results for all annual state ELP assessment examinees were available, these 
students (listed as RFEP in the state academic content test data file for the same academic year) could be 
identified and included in the analysis. Data analysts assisting states to utiU 2 e these analytic techniques 
should investigate these kinds of issues and address them as possible. 

This last caveat raises a more general issue for analysts and decision-makers to keep in mind. As an 
empirical fact, states have a given EL population at a given point in time according to the current entry 
and exit criteria they employ. Any empirical exploration to define (or redefine) the English-language 
proficient level will be affected by the current EL classification or reclassification criteria that define the 
population of a given state. 

For example, states currently using an ELP performance standard below the highest possible 
performance level will lose ELP assessment data on students above that cut point. Moreover, in states 
with multiple reclassification criteria that include academic achievement measures, ELs at higher ELP 
levels (including the state’s currently defined EngUsh-language-proficient level) who also perform higher 
academically are more likely to exit EL status and therefore no longer be assessed on the state’s language 
proficiency assessment. This means that even students who remain EL at higher ELP levels in these 
states are likely to perform less well academically by definition. Such censoring can result in a systematic 
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underestimation of academic performance of ELs because it reports (at the higher grades especially) only 
the results of those who remain EL — often long-term ELs — as well as the results of newcomers. This 
skimming bias is now more widely recognized, and it particularly affects accountability decisions on 
overall EL subgroup performance (see Working Group on ELL Policy 2010). The skimming bias also 
likely yields empirical findings suggesting higher ELP assessment cut points for the EngUsh-language- 
proficient level at high school grade levels. However, decision-makers generally reject such an extreme 
for the state performance criterion, as results from elementary and middle school grades tend to be more 
consistent and signal a lower ELP performance standard. 
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III. Establishing a Time Range for English 
Learners to Attain an 
English-Language-Proficient 
Performance Standard 



Overview 

In defining accountability provisions regarding English Learners (ELs) attaining English proficiency 
(i.e., annual tneasureable achievement objectives, or AMAOs), federal law specifically mentions time: 

Such annual measurable achievement objectives shall be developed in a manner that 
reflects the amount of time [italics added] an individual child has been enrolled in a 
language instruction educational program. §3 122 (a) (2) (A). 

The statute effectively requires states to pay careful attention to the amount of time ELs are expected to 
be in language instruction educational programs. This in turn has implications for states in setting annual 
progress expectations (AMAO 1) and establishing time frames to attain English-language proficiency 
(AMAO 2). Because state accountability systems need to reflect such expectations, this chapter illustrates 
approaches for establishing a reasonable yet rigorous time frame for ELs to attain the proficient 
performance standard on state ELP assessments. 

Given its importance, empirical research on this topic has been surprisingly limited, but is nevertheless 
instmctive. For example, Hakuta and others (2000, 13) examined this question and concluded that 

even in districts that are considered the most successful in teaching English to EL 
students, . . . [attaining] academic English proficiency can take 4 to 7 years. 

Others have derived from empirical research similar time estimates (e.g., Genesee et al. 2006; Linquanti 
and George, 2007; Cook et al. 2008; Taylor et al. forthcoming). What emerges from these studies is that 
time frames to reach ELP vary based on several factors (e.g., initial EngUsh-proficiency level, particular 
language domain(s) assessed; age or grade on entry; primary language literacy level; type of 
language-proficiency assessment; background of EL students within a school, district or state; 
instructional program goals) with estimated time frames ranging from three to seven years. 

Given the variety of factors that influence time to attain English proficiency, states should conduct 
empirical analyses of their own existing data to derive a time-to-EngUsh-proficiency expectation for their 
EL populations. In doing so, the states need to consider several issues: 

1 . Longitudinal data is often limited. Many states have a limited number of years of 

standardized longitudinal data for ELs (e.g., three to five years). In these cases, many students in 
a state’s data set have not yet attained ELP. In calculating time to an English-proficient 
performance standard, how does a state account for students who have not yet attained the 
standard? For example, if there are 1,000 EL students in the first year of a record system and 250 
of those have not attained English proficiency after five years of language instruction, how does 
a state calculate time to English proficiency? Using only the 750 students who have attained the 
performance standard to calculate time to ELP will generate an underestimate of how long it 
actually takes, as the analysis effectively excludes the 250 students who have not yet attained the 
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goal. The not-yet-English-proficient students need to be accounted for in any method calculating 
time to ELP. 

2. State policies defining ELP evolve over time. Proficiency standards change; ELP assessments 
change or are restandardi 2 ed; and criteria defining ELP performance levels are adjusted. The 
expected EngUsh-proficient standard can therefore change for different cohorts of EL students, 
and the resulting “move in the goal post” can make year-to-year comparisons problematic. 

3. EL language proficiency growth rates vary significantly. Empirical studies of EL student 
growth on ELP assessments (Cook et al. 2008, Linquanti et al. forthcoming ) illustrate a general 
pattern: Students starting at lower proficiency levels will likely take longer to attain the EngUsh- 
language-proficient performance standard than will students starting at higher proficiency levels. 
Also, students of equivalent levels of language proficiency at higher grade levels are likely to take 
longer to attain the standard than their counterparts in lower grade levels. Given this general 
pattern, time frames to attain ELP may be sensitive to ELs’ initial EngUsh-proficiency levels and 
their grade span.i^ 

4. Data are often missing for EL students. EL student records often have missing data, even 
when these should be available. For example, EL students may have three years of ELP 
assessment information, but the subsequent two years of records lack information. Some of 
these students may have left the district, state or country, while others continue to be enrolled 
but do not have current records. Should the available information be discarded or incorporated? 
If the latter, what methods should be used to incorporate the data in ways that do not distort the 
time estimates? 

Certainly there are other issues that arise when attempting to determine time to ELP. The point is to 
identify the most salient ones and offer methods that adequately address these issues in answering the 
time-to-proficiency question. 

Key Approaches 

We present two approaches for establishing a target time frame for ELs to attain a preidentified ELP 
performance standard: 

1 . Descriptive analysis, which follows over time EL students who start at a prespecified date at 
varying English-proficiency levels. The proportions of EL students who annually attain the ELP 
criterion are then shown in a bar chart. The goal of this approach is to get a sense of percentages 
attaining language proficiency, by time, initial ELP level and grade span. 

2. Event history analysis, which is also known as survival analysis,!^ is used extensively in the 
fields of engineering and medicine to estimate the time required for an event of interest to occur 
(Klein and Moeschberger 1997).i8 For analyses here, the event is an EL student’s attaining the 
given ELP performance standard. The goal of this approach is to calculate a time frame that 
incorporates students for whom the event of interest does not occur. The following section 



16. However, current federal regulations interpreting Title III permit states to set target performance percentages 
for cohorts of ELs only on the basis of their time in U.S. schools. 

17. The term “survival function” is used here; yet the values presented in the following analyses are linear 
transformations of the survival function (called the failure rate) and are calculated as 1 minus the survival function. The 
event of interest represents a positive outcome for ELs, while “surviving” means not experiencing that event. 

1 8. In engineering, that event is often the failure of a mechanical part or component. In medicine, it is often the 
death of a patient. Covariates can be used with this procedure to address such questions as, “If this metal is used, how 
much longer will it be before this component fads?” or “If patients receive this treatment, will their lives be extended? If 
so, by how much?” 
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applies these analytic approaches to a robust EL data set, in order to generate estimates that 
suggest a range of options and then illustrate how these can be applied by decision-makers. 

As in the previous chapter, these methods are recommended for use in a deliberative process by expert 
stakeholders empaneled to make recommendations by considering empirical data from the population of 
interest, in addition to their experiences. The methods effectively offer two information points for 
consideration. The following section applies these analytic methods to examine data for EL students 
within one education agency referred as Education Agency 1 . 

Example: Education Agency 1 

Student data from Education Agency 1 (EA 1) are used to model both the descriptive and the event 
history analysis approaches. This education agency was chosen because relevant data were available for 
five years. ELs from kindergarten to fifth grade are included in these analyses. The sample was restricted 
to these grades for convenience and for illustration purposes. Because samples are reduced when ELs are 
divided by grade and ELP levels, results are presented by grade clusters, combining the data from grades 
K— 2 and grades 3—5. Doing so provides some sense of how time estimates can vary by grade span. 

From these clusters, only students first designated as ELs between July and October 2003 (the ELP 
testing window for this education agency) were selected. Although there are small numbers of students 
entering and identified as ELs later in the school year (e.g., spring 2004), confining the sample to the July 
to October testing window permits a clearer interpretation of how many years it takes to attain the 
English-proficient level according to the given ELP assessment’s performance standard. 

The 2003—04 school year was used as the starting point because this is the first year for which data are 
available for EA 1. EA 1 has EL student records from the 2003—04 to 2007—08 (i.e., five successive 
years). The state’s ELP assessment defines the English-proficient performance standard, using a 
conjunctive approach: Students must attain an overall composite score of 4 and have no domain score 
(i.e., reading, writing, speaking or listening, each weighted 25 percent) lower than 3. For both analytic 
approaches shown, EL students meeting this performance standard on the ELP assessment are 
considered to have experienced the event of interest and are categorized as English proficient.!^ 

Method A. Descriptive Anaiysis Approach 

In this approach, ELs who start English-language instruction educational programs at a specific date are 
identified. These students are then followed over time. The cumulative percentage of students reaching 
the EngUsh-proficiency criterion is shown for each successive year. In the last year of available data, the 
proportion of EL students in the cohort not reaching the English-proficient criterion is also shown. 
Proportions of students reaching the proficiency criterion each year are identified by initial ELP level and 
grade span, to highlight the differences in ELP attainment rates of groups based on these key variables. 



19. As this agency uses multiple criteria to exit students from EL status and students must take the state ELP 
assessment annually, per Title III requirements, until they leave EL status, there are students who meet the ELP 
performance standard but continue to take the ELP assessment annually, as they have not met the other criteria (which 
include academic criteria from standardized tests and classroom performance) needed to exit. Moreover, some of these 
students may not meet the ELP performance standard in a subsequent test administration. In the present analyses, once 
an EL student attains the ELP assessment criterion, he or she is considered to have experienced the event of interest, 
and any subsequent ELP assessment result, even if available, is not used. 



Establishing a Time Range to Attain ELP 



31 





The following table (Exhibit 10) shows the number and percentage of students who meet the assessment 
criterion by number of years in program. 



Exhibit 10. 

(Method A) 

Number and Percent of Students identified EL in Kindergarten to Second Grade From 
2003-04 Attaining the Engiish-Proficient Performance Standard, 
by initiai ELP Levei and Time in Program, Education Agency 1 


Elapsed Time 
(in Years) 


Total Number of Remaining 
Students With Complete 
Data 


Cumulative Number of 
Students Becoming English 
Proficient 


Cumulative Percent of 
Students Becoming English 
Proficient 


ELP Level 1 


1 


6,506 


809 


12% 


2 


5,697 


1,593 


24% 


3 


4,913 


2,023 


31% 


4 


4,483 


2,860 


44% 


ELP Level 2 


1 


6,671 


2,184 


33% 


2 


4,487 


3,254 


49% 


3 


3,417 


3,636 


55% 


4 


3,035 


4,393 


66% 


ELP Level 3 


1 


9,328 


5,271 


57% 


2 


4,057 


6,889 


74% 


3 


2,439 


7,353 


79% 


4 


1,975 


8,021 


86% 


Exhibit reads: Among grade K to 2 students identitied as EL during 2003—04, who were designated as such 
between July and October 2003 (i.e., at the beginning of the school year) and who started at ELP level 1, 12 
percent became proficient in one year. 

Note: Sample sizes are adjusted to prevent distorting the analysis with missing cases. For example, an initial sample 

size of 7,728 students was identified as ELs at ELP level 1 during 2003-04 and who were designated as such 
between July and October 2003 (i.e., at the beginning of the school year). Among them, those missing 
assessment data in years 1 through 3 (before year 4) are excluded from the analysis (n = 1,222), yielding a total 
sample of 6,506. Students without data in year 4 — the last year examined — are not excluded, as they are 
considered “censored,” that is, not having attained the English-proficient performance standard by year 4. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



The last column in Exhibit 10 provides a cumulative percent of EL students becoming English 
proficient in a given time frame. For example, for those in grades K— 2 whose initial ELP level is 1, the 
percentage of students becoming proficient after one year is obtained by dividing the number of students 
becoming proficient in year 1 by the total number of students with complete data in the initial year 
(i.e., 809/6,506 = 0.12). Thus, the cumulative percent of students becoming proficient after three years 
(i.e., at year 3) for those with initial proficiency level of 1 is 31 percent (i.e., 2,023/6,506). 
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Using data from Exhibit 10, Exhibit 1 1 shows that the proportion of ELs at each initial EngUsh- 
proficiency level who attain English proficiency increases over four years. Note also that the lower the 
initial proficiency level, the lower the percentage of students becoming proficient over the same time 
period. For example, of EL students whose initial proficiency level during 2003-04 was level 1, 44 
percent became proficient in four years, whereas 86 percent of EL students whose initial proficiency 
level was level 3 became proficient in four years. 



Exhibit 11. 

(Method A) 

Cumuiative Percentage of Students Attaining Engiish Proficiency, by Year, Kindergarten 
to Second Grade (Without Missing Records), Education Agency 1 



Level 1 
(n = 6.506) 



Level 2 
(n = 6,671) 



Level 3 
(n = 9,328) 



■ 



1 Year | 12% 

2 Years | 24% 




I I I 1 1 1 1 r 



86 % 



0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 



100% 



Percent attaining English proficiency 

Exhibit reads: Among grade K to 2 students identified as ELs during 2003-04, who were designated as 
such between July and October 2003 (i.e., at the beginning of the school year) and who started at ELP level 
1,12 percent became English proficient in one year. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Exhibits 12 and 13 present the same analyses for the grade cohort 3 through 5. 
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Exhibit 12. 

(Method A) 

Number and Percent of Students identified EL in Third to Fifth Grade 
From 2003-04 Attaining the Engiish-Proficient Performance Standard, 
by initiai ELP Levei and Time in Program, Education Agency 1 


Elapsed Time 


Total Number of Students 


Number of Students 


Cumulative Percent of 
Students Becoming 


(in Years) 


With Complete Data 


Becoming Proficient 


Proficient 


ELP Level 1 








1 


714 


60 


8% 


2 


654 


151 


21% 


3 


563 


263 


37% 


4 


451 


359 


50% 


ELP Level 2 








1 


168 


75 


45% 


2 


93 


93 


55% 


3 


75 


119 


71% 


4 


49 


133 


79% 


ELP Level 3 








1 


281 


215 


77% 


2 


66 


238 


85% 


3 


43 


252 


90% 


4 


29 


263 


94% 


Exhibit reads: Among grade 3 to 5 students identified as EL during 2003-04, who were designated as such 


between July and October 2003 (i.e., at the beginning of the school year) and who started at ELP level 1, 


8 percent became proficient in one year. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Exhibit 13. 

(Method A) 

Cumuiative Percentage of Student Proficiency, by Year, 

Third to Fifth Grade (Without Missing Records), Education Agency 1 



Level 1 
(n = 714) 



Level 2 
(n = 168) 



Level 3 
(n = 281) 



1 Yeatj 8% 

2 Years 1 21 % 




0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 



Percent attaining English proficiency 

Exhibit reads: Among grade 3 to 5 students identified as EL during 2003-04, who were designated as such 
between July and October 2003 (i.e., at the beginning of the school year) and who started at ELP level 1, 

8 percent became proficient in one year. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



A similar pattern was observed in the third- to fifth-grade cluster to that seen in the grades K to 

2 cluster. The lower the initial proficiency level, the longer it took to become proficient on the ELP 
assessment. For those whose initial proficiency level was 1 in 2003—04, about 50 percent became 
proficient in four years; whereas 94 percent of EL students whose initial proficiency level was 3 in 
2003—04 became proficient in four years. According to the descriptive results, the percentages of 
students becoming English proficient, conditioned on initial proficiency level, are higher in the grade 

3 to 5 cluster than in the grade K to 2 cluster in aU but two instances (i.e., those at initial ELP level 
1 after one or two years). 

Method B. Event History Analysis Approach 

Event history analysis is an approach for estimating the probability that a particular event of interest wiU 
occur in a given time frame. The event of interest here is an EL student’s attaining the ELP criterion. 
Several event history analysis methods are available. The one adopted here is the product-limit estimator 
(Klein and Moeschberger 1997, 84), commonly known as the Kaplan— Meier estimator. This event 
history analysis creates a variety of estimates. Of particular interest here is the survival function shown in 
Appendix E. 

The survival function provides the probability that students will become English proficient at a particular 
time. This statistic is well suited to aU students who experience the event of becoming proficient within a 
given time frame (for the current analysis, in four years). However, a survival function estimate cannot 
be calculated for students who do not attain the English-proficient criterion. This is a problem, since 
substantial numbers of students do not meet the criterion after five years. (In event history analysis, 
students who do not experience the event within the observed time frame are said to be “censored.” 
Similarly, students who have experienced the event of interest are said to be “noncensored.”) The use of 
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only noncensored ELs (i.e., those who meet the criterion) for the event history analysis will very likely 
generate an underestimate of the time it takes to become English proficient (i.e., to experience the 
event). To overcome this shortfaU, two corrective procedures have been adopted. The first procedure 
[Censored A-djustment 1) yields an underestimate of the time it will take, while the second [Censored 
Adjustment 2) yields an overestimate of that time. The use of both adjustments provides comparative 
information on time frames that ELs are likely to need to attain the English-proficient criterion. For 
Censored Adjustment 1, aU students who are censored are assumed to attain the English-proficient 
criterion in the followingjear. For example, all students censored at year 4 (4th year in the program) are 
assumed to attain the criterion at year 5 (the 5th year). Adjustment 1 is an underestimate because it is 
unlikely that all students in EL programs in EA 1 will attain English proficiency in year 5 and instead 
some students wiU take a longer time until they become proficient. Thus, event history analysis estimates 
will be shorter than what would be observed if all students could be followed until they actually reached 
the English-proficient criterion. Censored Adjustment 2 draws on Hakuta and coUeagues (2000) and 
Cook and colleagues (2008). Hakuta and colleagues state that students can attain academic ELP 
somewhere between four and seven years. For this analysis, seven years is used as the maximum time for 
students to attain the EngUsh-proficiency criterion. Cook and colleagues demonstrate that students at 
different EngUsh-proficiency levels grow at different rates. For the Censored Adjustment 2, the empirical 
observations made by Hakuta and colleagues and Cook and coUeagues are combined. That is, students 
who started at the lowest proficiency level (level 1) in the 2003—04 school year are assumed to take 
seven years to attain the EngUsh-proficient criterion. Students starting at level 2 wiU take six years, and 
students starting at level 3 wiU take five years. Adjustment 2 is assumed to be an overestimate, as 
“maximum time frames” from prior empirical research are utiUzed.^o 

Data from the noncensored and censored students (with imputed times) are then analyzed, using event 
history analysis. However, the number of students becoming proficient over time and the number of 
censored students are exactly the same for the two methods described above. The only difference is the 
year imputed for the censored cases . 21 

The exhibits below (14 and 15) graphicaUy display results from the event history analysis for both 
methods. The horizontal axis of each figure displays years in language instruction educational programs. 
The vertical axis shows the probabiUty of an EE’s becoming proficient. (Results tables can be seen in 
Appendix E.) 

Two grade clusters are displayed: kindergarten to second grade and third to fifth grade. The event history 
graphs are typicaUy displayed as step functions. The horizontal axes in the graphs displaying Censored 
Adjustment 1 (Exhibit 14) and Censored Adjustment 2 (Exhibit 15) show seven years. Note that 
Censored Adjustment 1 has a maximum time of five years. The seven-year time line is provided in 
displayed graphs for comparative purposes. 



20. For the Censored Adjustment 2, because there are no additional data after year 4, the imputed values do not 
provide additional information to calculate further probabilities. Thus, from years 4 to 7, the probabilities are effectively 
unchanged. 

21. By design, the two methods provide the same results for the first two years, as Hi in the survival function is the 
same across the two methods because no censored cases are assumed in year 0, and the total number of students in year 
1 is determined by subtracting the number of students becoming proficient and the number of students censored in year 
0, which is zero, from the total number of students in year 0. (See Exhibit E.l in comparison with Exhibit E.2.) 
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Exhibit 14. 

(Method B) 

Censored Adjustment 1 

Probabiiity of ELs identified During 2003-04 Becoming Proficient, 
by Grade Cohort and ELP Levei, Education Agency 1 
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Exhibit reads: For students identified as ELs during 2003-04 who were designated as such between July 
and October 2003 (i.e., at the beginning of the school year) and who began at ELP level 1, there is a 
1 0 percent probability of attaining English proficiency in one year. 

Note: The state ELP assessment provides five ELP performance levels: Beginning (Level 1) , Early Intermediate 

(Level 2), Intermediate (Level 3), Early Advanced (Level 4), and Advanced (Level 5). Level 4 or higher 
represents this agency’s English-Language-Proficient performance standard. The corresponding scale score 
range for each ELP performance level varies across grades. For grades K to 2, the number of students at the 
Beginning ELP level was 7,728, at the Early Intermediate ELP level was 7,603, and at the Intermediate ELP 
level was 10,045. The total number of students was 25,376. For grades 3 to 5, the number of students at the 
Beginning ELP level was 1,043, at the Early Intermediate ELP level was 235, and at the Intermediate ELP level 
was 335. The total number of students was 1,613. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



The observed event history curves from both methods (Exhibits 14 and 15) are similar. As expected, 
Censored Adjustment 1 shows students meeting the criterion in higher proportions than in Censored 
Adjustment 2. That is, Censored Adjustment 1 predicts a shorter time for students to reach the criterion 
than Censored Adjustment 2 does. But these differences are slight. With both methods, higher 
proportions of level 3 students meet the criterion in the third- to fifth-grade cluster than in the 
kindergarten to second-grade cluster. 
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Exhibit 15. 

(Method B) 

Censored Adjustment 2 

Probabiiity of ELs identified During 2003-04 Becoming Proficient, 
by Grade Cohort and ELP Levei, Education Agency 1 
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Exhibit reads: For students identified as ELs during 2003-04 who were designated as such between July 
and October 2003 (i.e., at the beginning of the school year) and who began at ELP level 1, there is a 
10 percent probability of attaining English proficiency in one year. 

Note: The state ELP assessment provides five ELP performance levels: Beginning (Level 1) , Early Intermediate 

(Level 2), Intermediate (Level 3), Early Advanced (Level 4), and Advanced (Level 5). Level 4 or higher 
represents this agency’s English-Language-Proficient performance standard. The corresponding scale score 
range for each ELP performance level varies across grades. For grades K to 2, the number of students at the 
Beginning ELP level was 7,728, at the Early Intermediate ELP level was 7,603, and at the Intermediate ELP 
level was 10,045. The total number of students was 25,376. For grades 3 to 5, the number of students at the 
Beginning ELP level was 1,043, at the Early Intermediate ELP level was 235, and at the Intermediate ELP level 
was 335. The total number of students was 1,613. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Application of Methods for Decision Making 



How might results of these two methods be used by stakeholders for decision making? Results from 
these approaches provide insight into the proportion of EL students who actually attain, as well as the 
probability that they will attain, the English-proficient criterion within certain time frames. These 
methods, however, are not intended to provide policymakers with a simple definitive answer to the “how 
long?” policy question. The analyses should support but not determine such a policy decision. 
Professional judgment by informed panelists and policymakers is required to weigh these outcomes and 
utili 2 e them to set expected time frames that are rigorous and defensible. Also, it is important to point 
out that both analyses provide information about how long it is observed or estimated for EL students 
to attain English proficiency based on current practices. The two methods say nothing about how long it 
should take with improved instructional practice. In establishing expectations for accountability purposes, 
this expectation of improved instructional practice should also be taken into account to avoid 
unintentionally setting lower expectations. 

The following discussion applies results from Education Agency 1 to illustrate how these methods could 
be used to set adjusted time frames for EL students by initial EngUsh-proficiency level. 

For example, a time frame where 60 percent (or a probability of 0.6) of a state’s ELs attain the 
English-proficient criterion (arguably a clear majority) might be considered as an initial place to begin 
deliberations among stakeholders. A starting point time frame in which a clear majority of EL students 
have met the criterion provides both an “existence proof’ that the target time frame is attainable and 
also provides a justifiable “stretch” in target performance for both educators and students. 

To illustrate how this time frame could be identified from the empirical data. Exhibit 16 shows 
proportions (actual or predicted) of students attaining the proficiency criterion by each approach. Cells 
having proportions above 60 percent or 0.60 are shaded. 
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Exhibit 16. 

Combined Outcomes From Descriptive Approach and Event History Anaiyses, 
by Grade Ciuster, 2003-04 initiai Proficiency Levei and Time 




Kindergarten to Second Grade 


Third to Fifth Grade 


Level/Time 


Descriptive 

Approach 
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(Probability) 
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(Probability) 
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1 

(Probability) 
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Adjustment 
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(Probability) 


ELP Level 1 
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0.10 


0.10 


8% 


0.06 


0.06 
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24% 


0.21 


0.21 


21% 


0.14 


0.14 
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0.27 


0.25 
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0.37 
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0.34 


ELP Level 2 
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0.32 


2 


49% 


0.43 


0.43 
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0.48 
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0.51 
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0.58 
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0.63* 


0.57 


ELP Level 3 
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57% 


0.52 


0.52 


77%* 


0.64* 


0.64* 


2 


74%* 


0.69* 


0.69* 


85%* 


0.71* 


0.71* 


3 


79%* 


0.74* 


0.73* 


90%* 


0.76* 


0.75* 


4 


86%* 


0.82* 


0.80* 


94%* 


0.83* 


0.79* 


Exhibit reads: For example, for grade K to 2 students identified as ELs during 2003—04, who were 


designated as such between July and October 2003 (i.e., at the beginning of the school year) and who started 


at ELP level 1, the descriptive a 


oproach shows that 12 percent became English proficient in one year, and 


event history analyses in both Censored Adjustments 1 and 2 show that there is a 10 percent probability of 


attaining English proficiency in one year. 

* Cells having proportions above 60 percent or .60 are shaded. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Exhibit 16 (above) presents four years of results. (Recall that there were five data points representing 
change over four instructional years). Estimated probabilities beyond year 4 in the event history analyses 
are identical, as these analyses adjust estimated proportions within the observed time frame on the basis 
of censored case assumptions. 

For aU levels and clusters, the descriptive approach yields higher proportions of students’ attaining the 
ELP criterion than do the event history analyses’ probabilities. Similarities between observed proportions 
and estimated probabilities might be expected. But the descriptive approach does not take censoring into 
account. As mentioned above, the event history analysis using Censored Adjustment 1 has at least the 
same probabilities as Censored Adjustment 2. 
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None of the approaches shows more than 50 percent, or 0.50, probability of the lowest level (ELP 
level 1) EL students’ attaining the EngUsh-proficient criterion within the observed four-year time frame. 
For ELs entering Education Agency 1 at an initial ELP level 2, both the descriptive and Censored 
Adjustment 1 approaches show more than 60 percent, or 0.60, of students in the kindergarten to second- 
grade cluster meeting the criterion within four years. For the third- to fifth-grade cluster, the descriptive 
approach shows 60 percent of students meeting the criterion in three years, and the Censored 
Adjustment 1 shows 0.60 occurring in four years. For Censored Adjustment 2, neither grade cluster has a 
probability of more than 0.60 within the observed time frame. Students starting at the higher initial 
EngUsh-proficiency level (ELP level 3) reach the 60 percent or 0.60 threshold far sooner than students 
starting at lower proficiency levels. 

On the basis of the above results. Education Agency 1 might consider the following approach for setting 
an initial expected time frame for reaching the ELP performance standard. The students at the lowest 
initial ELP level are not observed or predicted to attain the EngUsh-proficient performance standard at 
the 60 percent, or 0.60, threshold within the four-year time frame. ^2 Thus, a process is needed to predict 
when these students may reach this threshold. To calculate how long it wiU take to reach this threshold, a 
simple regression procedure can be employed using the available data. (See Exhibit 17, below.) 



22. Alternatively, states could rank districts on the basis of the proportion of EL students attaining the English- 
language-proficient standard each year. Following this, an nth percentile rank criterion could be established. A graph 
could be plotted and smoothed and used to identify the target attainment rate from that district-ranked performance. 
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Exhibit 17. 

Percent of initiai ELP Levei 1 ELs Attaining the Engiish-Proficient Threshoid 
Across Anaiytic Approaches and Grade Ciusters Predicted Beyond Observed Years 




Descriplive Approach 
-O— Event History Analysis 



Exhibit reads: Using a descriptive approach, 10 percent of students across grade clusters at ELP level 1 
were estimated to have attained the English-proficient standard in year 1. For the event history analysis 
approach the estimated value was 8 percent in year 1. 

Note: Values up to year 4 were calculated by averaging across approaches and grade clusters in Exhibit 16. Values 

beyond year 4 were obtained by using the year 1 to 4 averages and calculating a slope, using a linear 
regression. Therefore, the values for years 5 through 7 are predicted values. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



The graph and table displayed in Exhibit 17 show an example of a simple regression procedure. Values 
in Exhibit 17 are derived by averaging across approaches and grade clusters. For example, the percentage 
of students at Level 1 attaining the English-proficient standard in the grade K to 2 cluster, in the first 
year of English language instruction, is 12 percent (see Exhibit 16). The proportion attaining proficiency 
in the grade 3 to 5 cluster is 8 percent. The average proportion across clusters for level 1 ELs in year 1 is 
10 percent, which is the value shown in the descriptive approach in Exhibit 17 for year 1. For the event 
history analysis, averages are calculated across Censored Adjustments 1 and 2 and across clusters. Values 
up to year 4 in the table are calculated similarly. Values beyond year 4 are obtained by using the year 1 to 
4 averages and calculating a slope, using a linear regression. The slope then is added to subsequent years. 
The slope value of the descriptive approach is 12 percent, and the slope of the event history approach is 
0.097. These values are then added to the year 4 average, which results in values of 59 percent, or 0.47, 
for year 5. The slope values are then added to the predicted year 5 results, and so on. On the basis of the 
descriptive approach, the lowest level students are predicted to reach the 60 percent, or 0.60, threshold 
in 5 to 6 years, and on the basis of the event history approach, 6 to 7 years. 



Establishing a Time Range to Attain ELP 



42 



For initial ELP level 2 students, two of the three approaches show that the 60 percent, or 0.60, threshold 
is met by or before the fourth year. Thus, a four- year time line could be adopted for level 2 students. 

Level 3 students have noticeably different time frames by grade cluster. A one-year time line could be 
adopted, but that may not be reasonable for ELs in the kindergarten to second-grade cluster, especially 
because literacy is strongly developing and measured substantially differently at these grades. Were this 
agency to set a single criterion for both clusters, a time frame of two years might be more reasonable. 

Clearly, students’ initial ELP level strongly influences the expected time frame for their attaining English 
proficiency. Using these analyses, more refined time-to-EngUsh-proficiency criteria could be derived for 
EA 1. For example, EL students at initial ELP level 3 might be expected to attain the English-proficient 
criterion in two years, initial ELP level 2 students in four years and initial ELP level 1 students in six or 
seven years. 

Summary 

Several considerations should be addressed when interpreting findings. First, only one cohort of students 
was used in analyses. Ideally, states would use as many cohorts as is practicable. Second, states should 
use empirical approaches, along with the expert judgment of EL educators and informed policymakers, 
to determine an ambitious but reasonable time range for attaining the ELP performance standard. Using 
only empirical analysis to establish time frames is insufficient. Third, EA 1 uses multiple criteria to 
determine whether ELs require further English language instructional support. Analyses here only 
identify time to when students might be considered proficient. Fourth, two grade clusters within 
elementary grades were analyzed. Higher grades might exhibit very different time-to-proficiency 
characteristics. States should examine all grade clusters when conducting these types of analyses. Fifth, 
students with censored data were included in the event history analyses. That is the advantage of using 
event history analysis. Nonetheless, assumptions were made about how censored students performed. In 
performing event history analysis, states should deliberate on the assumptions they choose to make 
about censored cases. These assumptions may differ from those presented in the EA 1 example. Sixth, 
two approaches are used to explore this question. Other statistical methods could be employed, as well 
(e.g., mixed linear models). Approaches presented here should stimulate discussion among states and 
researchers on more precise methods of determining a time range for EL students to attain the English- 
proficient performance standard. Finally, the results presented here describe how long it is observed to 
take to reach the EngUsh-proficiency criterion — not how long it should take. Policymakers should take this 
into account when determining target time frames and percentages of EL students expected to meet 
them. 
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IV. Taking Into Account English Learners’ 
English-Language Proficiency Level When 
Establishing Academic Progress and 
Proficiency Expectations 



Overview 

The preceding two chapters have illustrated empirical methods states can use to inform their 
deliberations on (1) determining an English-language-proficient performance standard on the state ELP 
test in relation to English Learners’ (EEs’) performance on academic content tests at different ELP 
levels; and (2) establishing a challenging and realistic time range to attain that English-language-proficient 
performance standard. This final chapter explores empirical methods that states can use to inform 
deliberations on setting academic progress and proficiency expectations for ELs that reasonably take into 
account students’ ELP levels and their time in the state school system. Viewed through the lens of Tide 
III requirements, where the prior two chapters focused on issues related to ELP progress over time 
(AMAO 1), and a rigorous EngUsh-language-proficient performance standard (AMAO 2), this chapter 
bears direcdy on AMAO 3, and therefore necessarily on Tide I academic performance criteria and targets 
for the EL subgroup because these achievement expectadons and results (called adequate yearly 
progress, or AYP) are applied as AMAO 3 for Tide III subgrantees. 

There has long been concern among researchers and policy analysts regarding ESEA Tide I AYP’s 
status bar academic performance expectadon and its 100 percent academic-proficiency-by-2014 target 
(see, for example, Linn, 2005; Ho, 2008; Ryan and Shepard, 2008). In pardal response to the first 
concern, the U.S. Department of Educadon’s Growth Model Pilot Prefect (U.S. Department of Educadon, 
2005) allowed approved states to refine their Tide I accountability systems in order to recognize and 
receive credit for students meedng predefined academic growth criteria as being “on track” to meet or 
exceed the state’s academic proficiency performance standard within a reasonable dme frame, and 
therefore making AYP, the law’s 2014 100 percent proficiency requirement notwithstanding (U.S. 
Department of Education 2011). In none of these pilot states, however, are growth expectadons set for 
ELs by their current or expected ELP level. 

The issue of setdng academic progress and proficiency expectadons for ELs condidoned to some degree 
by their ELP level must be approached with extreme care. On the one hand, researchers have 
consistently noted that academic accountability provisions of the ESEA are pardcularly problemadc 
when applied to the EL populadon, as limited English language proficiency fundamentally affects an EL 
student’s capacity both to benefit from academic content instrucdon delivered in English and to 
demonstrate knowledge and abiUdes on academic assessments given in English (Abedi 2004; Francis and 
Rivera 2007). On the other hand, educadonal rights advocates have been wary of proposals to establish 
differendal expectadons in academic progress and performance for different subgroups of students, out 
of concern that these could easily lead to lower expectadons and diminished attendon by educators to 
these students’ academic needs (Nadonal Council of La Raza 2006; Educadon Trust 2006). Moreover, 
the now well-recognized, long-term English Learner phenomenon (Olsen 2010) — whereby students are 
unable to meet the Unguisdc and other (often academic) criteria required to exit EL status despite many 
years in the U.S. school system — suggests that condidoning academic expectadons or results solely on 
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students’ ELP level, without equal regard to their time in the state’s school system, could have 
unintended negative consequences for this population. ^3 

In light of these issues, and anticipating ESEA reauthorization, a national group of EL researchers has 
recendy argued for the law to incorporate time into accountability provisions for the acquisition of 
English language proficiency, and to require states to establish expected time frames for the development 
of ELP. Moreover, these researchers have argued that, for each EL assessed in English, states should 
incorporate ELs’ ELP into accountability provisions for content area achievement using these expected time 
frames (Working Group on ELL Policy 2010, 2011). 

Two key points are inherent in these interrelated recommendations: First, content area achievement 
results should be adjusted for EL students’ ELP level. Second, given the importance of setting an 
expected time frame for EL students to attain ELP, this time frame should be part of the content area 
performance result adjustments. Two methods are mentioned in the Working Group’s documents as 
possibilities for satisfying these requirements: progressive benchmarking and indexed (i.e., weighted) 
progress. 

This chapter explores these methods by illustrating how they might be operationalized with empirical 
data for a state’s policymaking deliberations. Specifically, it examines two approaches to the progressive 
benchmarking method, and one approach to the indexed progress method. Additionally, given potential 
equity concerns (or stakeholder reluctance) regarding conditioning academic performance expectations 
or results specifically for ELs, this chapter also considers a third method, which does not rely on EL 
students’ ELP level — the status and growth accountability matrix approach. This method acknowledges 
student attainment of academic proficiency (i.e., the AYP performance standard) or a predetermined, 
acceptable level of student growth toward academic proficiency (e.g., a level of academic progress to be 
considered “on track” to proficiency in a reasonable time frame). 

Key Approaches 

We use two approaches to progressive benchmarking, and one approach to indexed progress; both 
methods take into account an EL’s ELP level and time in the state school system when setting his or her 
academic progress and proficiency expectations. The third approach explored expUcidy ignores an EL’s 
ELP level when judging academic progress and proficiency. Each method is described in detail as 
follows. 

1 . The progressive benchmarking methods adjust either (a) EL students’ content achievement 
scale scores or (b) EL students’ weight (their individual “count”), based on each student’s ELP 
level relative to his or her initial ELP level and time in the state school system. In this method, 
there is an expectation that (1) students will increase in English language proficiency annually 
from their level of initial English proficiency and that (2) students will increase in content 
achievement annually. Thus, while recognizing the effect of limited English proficiency on ELs’ 
academic performance on tests given in English, scale score or calculative weight adjustments 
lessen as students increase in ELP level, as expected, or, if they do not, as they continue in EL 
status over time. At the end of the time frame expected for ELs to attain English language 



23. For example, an EL at an “intermediate” ELP level might have his or her English or language arts (ELA) 
performance result adjusted in recognition of this level of English proficiency; yet the student may have spent several 
years in the state’s school system at this same ELP level. It would be an unintended negative consequence if 
accountability policy conferred to a school system an “on track” academic judgment for this EL student as a result of 
her lack of expected ELP progress. 
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proficiency (or sooner if they attain that level), students’ content achievement scores or 
calculative weights are no longer adjusted. In essence, expected performance benchmarks 
progressively increase (and corresponding adjustments progressively decrease) to the point at 
which no adjustments are made at aU. 

2. The indexed progress method uses an ELs’ ELP growth as a proxy for academic content 
performance on a weighted, time- sensitive basis for more newly arrived ELs who enter the 
state’s school system at lower initial ELP levels. These weights and time frames are empirically 
derived for each subject matter and grade tested because “the impact of limited English 
proficiency on academic performance varies by subject matter and grade [e.g., ELs with lower 
levels of language proficiency have more difficulty demonstrating content knowledge in English 
or language arts compared with mathematics, and this difficulty increases at higher grade 
levels])” (Working Group on ELL Policy 2010, p. 5). 

3. The status and growth accountability matrix (SGAM) method acknowledges student 
attainment of academic proficiency (i.e., the AYP performance standard) or a predetermined, 
acceptable level of student growth toward academic proficiency (e.g., a level of academic 
progress to be considered “on track” to proficiency in a reasonable time frame), without 
considering an EL’s ELP level. 

Before applying these methods using empirical data, the following presents a more in-depth explanation 
of each method. 

Method 1 (progressive benchmarking) adjusts either the scale scores or the way students are counted 
(weighted) in calculating results. To support the creation of adjustments, states need some knowledge of 
the distribution of EL students’ content achievement scores over time and by ELP level. The box plots 
in Exhibit 1 8 below show the distribution of scores for EL students in grade 3 of Education Agency 1 , 
the Education Agency selected for this chapter’s illustrative analyses. 

Two sets of box plots are shown, one for mathematics and one for English or language arts. On the 
horizontal axis, ELP levels are displayed. The vertical axis shows the content assessment scale score 
range. A horizontal line is drawn at the scale score value of 350, which represents the “proficient” cut 
score on the assessment for this grade. The boxes at each ELP level show the distribution of scale 
scores. The bottom line of each box is drawn at the 25th percentile of the distribution, and the top line 
represents the 7 5th percentile. The line drawn within each box shows the median score for that level, 
and the diamond displays the mean. The T-shaped lines (“whiskers”) above and below each box show 
extreme scores for that distribution. 



24. The importance of ensuring that the English-language-proficient performance standard is carefully examined 
relative to EL students’ likelihood of attaining academic content area achievement standards is explored in Chapter II. 
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Exhibit 18. 

Education Agency 1, Grade 3: Box Piots of Engiish or Language Arts 
and Mathematics Scaie Scores, by ELP Performance Levei (2007-08) 




Exhibit reads: Grade 3 EL students performing at the Early Advanced ELP performance level (or level 4) 
have a median performance of 334 scale score points in English or language arts, 1 6 points below the 
proficient performance standard. 

Note: Appendix Exhibit F.l presents the descriptive statistics associated with this graph. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Notice that the lower the ELP level, the lower the distribution of English or language arts and 
mathematics scale scores. At the lowest ELP levels, a vast majority of EL students are performing well 
below the proficient standard. The poor content test performance by low ELP-level ELs seen in these 
exhibits has been observed across many other grades and education entities (i.e., districts and states) and 
with other academic content and ELP assessments. While these exhibits do not show the length of time 
ELs have been in the system, they do provide a starting point for considering the setting of ambitious 
and realistic progress expectations. 

Specifically, the distributions shown in the exhibits above can be used to establish “benchmarks” to 
adjust expectations based on ELP level. Adjusting content scores based on language proficiency level 
alone is insufficient, however. Students are expected to grow in their ELP (as discussed in Chapter III) 
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and those time-based expectations should be factored into content score adjustments. For illustrative 
purposes, a timeline like the one shown in Exhibit 1 9 below is adopted here. 







Exhibit 19. 






Expected English-Language Proficiency (ELP) Level Growth, by Year in State Schools 




Expected ELP Level by Year in School 


ELP Level 


Initial Year 


2nd Year 


3rd Year 


4th Year 


Level 1 


Level 1 


Level 2 


Level 3 


Level 4 


Level 2 


Level 2 


Level 3 


Level 4 


Proficient 


Level 3 


Level 3 


Level 4 


Proficient 


- 


Level 4 


Level 4 


Proficient 


- 


- 


Exhibit reads: EL students starting at ELP level 1 in the initial year are expected to move to level 2 in the 


2nd year, level 3 in the 3rd year, and level 4 in the 4th year; whereas smdents starting at ELP level 4 at outset 


are expected to become English language proficient in the 2nd year. 





Students starting at the lowest level (Level 1) could potentially receive four years of adjustments. But 
each year these students would receive lesser adjustments based on the expectation that they will attain 
progressively higher ELP levels. Students at Level 4 would receive an adjustment only in their initial year. 
This table illustrates the progressively increasing expectations of ELP over time. Accordingly, academic 
content score adjustments lessen either as ELP level increases or, failing that, as ELs’ time in the school 
system progresses. ^5 And as will be seen in the worked examples below, adjustments can be made either 
to ELs’ actual scale score result, or to their calculative weight. 



25. Note that these linear ELP growth expectations are posited for the purpose of adjusting content performance 
expectations in relation to students’ ELP level and time in the school system. Empirical analyses of ELP growth suggest 
that several factors may influence the way ELP growth occurs. (See the text box below for a brief elaboration.) 
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How Might Variability in Growth on ELP Assessments Inform the Setting of 
Annual Progress Expectations? 

Although research about how ELs grow on ELP assessments is limited, the existing evidence 
suggests students at different ELP levels grow at different rates (Cook and Zhao 2011; Linquanti and 
others, forthcoming; Taylor and others, forthcoming). Exhibit 20 below depicts growth rates (in 
vertically- scaled ELP assessment scale score points) over three years for ELs at the third, fourth, or 
fifth grade in the initial year. (Only EL students with four years of ELP assessment scores were 
included.) EL students at lower initial ELP levels have much steeper growth rates than students at 
higher initial ELP levels. This characteristic has been observed on many different types of ELP 
assessments, motivating Cook and others (2008) to introduce the descriptive mnemonic, “lower is 
faster, higher is slower.” That is, students at lower ELP levels tend to grow at higher rates than those 
at higher ELP levels. This observation also extends to grades: For a given initial ELP level, EL 
students at lower grades tend to grow at higher rates than their counterparts at higher grades. 



Exhibit 20. 

Rates of Growth in ELP Scale Score, Grades 3, 4, 5, by ELP Level in Base Year 




Exhibit reads: An EL in grade 3, 4, or 5 beginning at ELP level 4 in the initial year of the analysis is 
estimated on average to grow from an ELP scale score of 361 to 389 over a three-year period. 

Source: Adapted from Cook and Zhao (2011). 

These patterns suggest that setting a “one-size-fits-all” progress expectation for ELs may not be 
realistic. Doing so could set lower expectations for some ELs and potentially unattainable 
expectations for others. Nor does this exhibit reveal the whole story — many other factors might also 
influence ELP growth. For example, ELs who read at grade level in their native language will likely 
grow at different rates than students not literate in their native language — so too for students with 
interrupted formal education compared with those consistendy enrolled in school. Beyond student 
characteristics, different EL instructional services and settings may influence ELP growth rates. AU 
said, we need to better understand how ELs grow on ELP assessments and what characteristics and 
factors influence that growth. 
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Method 2 (indexed progress) takes an EL’s growth in ELP and applies it as a proxy measure of ELA 
proficiency. That is, for those ELs at the lowest levels of initial English proficiency, sufficient growth in 
ELP scores can be used — on a weighted, time-sensitive basis — in Ueu of (or combined with) their ELA 
proficiency scores. It should be noted that this method has in the past been disallowed under ESHA 
Tide I AYP regulatory requirements. Nevertheless, given the Department’s recent flexibility policy 
allowing states to request waivers from key ESELA accountability provisions, and ESELA Tide I and Tide 
III requirements that states’ English language proficiency standards both redect and support 
development of academic language found in state academic content standards, a radonale remains for 
arguing that, for more newly arrived, low ELP-level ELs, progress in ELP can reasonably indicate 
proximal progress in EEAA 

Specifically, for recendy arrived ELs at the beginning stages of learning English, this “proxy” approach 
may offer a more valid application of test results than use of their performance on ELA assessments 
alone. Again, the use of ELP gains as a proxy for ELA should be only for a limited dme.^^ 

How might progress expectadons be generated for consideradon? Exhibit 21 shows 
second-to-third-grade growth in ELP for four language proficiency levels in Educadon Agency 1 . 



Exhibit 21. 

Engiish Language Proficiency Composite Scaie Score Growth, by ELP Levei, 
From Second to Third Grade, by Second-Grade ELP Levei 




ELP Assessment Composite Level Growth (in Scale Score Points) 



Exhibit reads: Among third-grade EL students who scored at ELP level 1 in second grade, 25 percent 
grew at a rate of 87 score points or less from second to third grade (i.e., such an EL at the 25th percentile 
of the distribution was estimated to have ELP composite scale score growth of 87 score points). 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



26. While a case might be made for ELP growth’s signaling the potential for growth in mathematics, the 
relationship between these two subject areas is less compelling for arguing that ELP growth may serve as proxy for math 
performance. 

27. Interestingly, current federal law allows states to waive ELs’ ELA performance results in AYP accountability 
calculations for their first year in the state school system, regardless of ELP level. This method builds on this tacit 
acknowledgment of the problematic measurement issues in ELA for newly arrived low-ELP-level ELs, and uses 
empirical evidence of the relationship between ELP and ELA, as well as time-based ELP growth expectations. 
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The bar chart displays results for three percentile ranks: the 25th, 50th, and 75th percentiles. At ELP 
level 1, third-grade EL students with a composite growth score of 87 scale score points are at the 
25th percentile (i.e., 25 percent of ELs grew at that rate or less). ELP 1 students with a composite 
growth score of 182 scale score points are at the 75th percentile. Notice that as ELP levels increase, 
growth rates decrease. 28 If an indexed progress method were adopted, growth expectations should differ 
based on ELP level. Potential acceptable target growth values generated from this approach could be 
used alone or in conjunction with a weighted or index formula with EL students’ ELA scores to create a 
composite indexed progress result (e.g., ELP level 1 students might receive the following weights/index 
values: 70 percent ELP growth score, 30 percent ELA score). Time frames for applying these weights 
would also need to be set based on empirical evidence relative to expected or actual ELP growth. 

Method 3 (status and growth accountability matrix) acknowledges student attainment of academic 
proficiency (i.e., the AYP proficient performance standard) or a predetermined, acceptable level of 
student growth toward academic proficiency (e.g., a level of academic progress to be considered “on 
track” to proficiency in a reasonable time frame), without considering an EL’s ELP level. This method 
posits that when “on track” growth on content assessments is considered in concert with content 
proficiency (i.e., status), the need to adjust EL outcomes based on ELP level diminishes. This approach 
can be illustrated by the matrix in Exhibit 22, below. Note that there are four quadrants, which represent 
two dimensions of accountability: status (rows) and growth (columns). 



Exhibit 22. 

Status and Growth Accountabiiity Matrix 



c 

o 

E 

v> 

<f> 

d) 

<f> 

<f> 

< 

c 

o 

c 




Growth on Content Assessment 


Low Growth 


High Growth 


Proficient or Above on 
Content Assessment 


1 

Students in this ceii are proficient or 
advanced but are growing at iower 
rates than other students. 


il 

Students in this ceii are proficient or 
advanced and are growing at adequate 
rates compared with aii other students. 


O 

c 




Ml 


iV 


o 

3 


Not Proficient on 
Content Assessment 


Students in this ceii are not proficient 


Students in this ceii are not proficient 


(0 




and are growing at iower rates than 


but are growing at adequate rates 


cn 




other students. 


compared with other students. 



Exhibit reads: Students in quadrant I are proficient or advanced but are growing at lower rates than other 
students. 



In this method, schools and districts are evaluated by how many students are proficient or how well 
students are growing. Students would either have to demonstrate academic proficiency in tested content 
(status) or adequate growth toward proficiency in content. Quadrant I characterizes students who have 
met the status requirement but have demonstrated low growth. Quadrant II characterizes students who 
have met both the proficiency and growth expectations. Quadrant III characterizes students who have 
met neither the status nor the growth requirement, and quadrant IV captures those students who have 
not met the status requirement but have met the growth expectation. Conceptually, students in quadrants 
I, II, and IV are meeting at least one requirement (status or growth). Students in quadrant III are meeting 
neither. Schools or districts that have substantial numbers of students in quadrant III might be identified 



28. The negative growth reflected in the performance of EL students at the 25th percentile (i.e., the bottom quartile 
of the performance distribution) is not unusual for students at higher ELP levels. (See Cook et al. 2008; Linquanti et al. 
forthcoming.) 
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by the accountability system for further examination, and perhaps provided support. The proportion of 
students in quadrants I, II, and IV could be used in Ueu of the percent proficient “status bar” in content 
area performance. Schools or districts doing well would have acceptable percentages of students in 
quadrants I, II, and IV. 

The underlying argument of the status and growth accountability matrix approach is that ELs’ 
demonstrating sufficient growth in academic content test performance obviates the need to adjust for 
their actual or expected ELP level. AU students would be held to the same status or growth 
expectations. 29 

Method Examples 

To illustrate their application, the methods described above are presented through worked examples. 
Only one grade is used (third grade) in these examples for economy of presentation, but potential users 
of these methods should be prepared to apply them to aU tested grades. Further, as has occurred with 
methods illustrated elsewhere in this report, there will likely be variability in findings across grades. Thus, 
expert counsel and deliberation among appropriate stakeholder representatives will be necessary to 
establish final recommendations in the application of any of these methods. Given the complexities 
involved, the following worked examples are presented step-by-step in order to clearly describe both the 
procedures and decisions made in carrying out each method. Certainly, different procedures or decisions 
might be made that would prove equally plausible, and possibly better, given local preferences and 
constraints. Descriptions provided here are intended to be illustrative only and to stimulate discussion, 
inspire alternative approaches, and spur further research. After each method application, descriptive 
statistics are provided that highlight differences among content proficiency results with and without the 
application of the methods. 

Method 1 (Progressive Benchmarking) 

Two progressive benchmarking methods are applied in this section. The first method adjusts scale scores 
based on student distributions and creates a scaling factor. The second adjusts how ELs at different 
language proficiency levels are counted based on students’ likelihood of being proficient on content 
assessments. 

1 .a. Adjusted Scale Score Method 

This method takes the scale score distribution of EL students on content assessments (see Exhibit 1 8) 
and creates adjustments to those scores, which will in turn affect content proficiency scores. The 
following steps outline how this method is applied. 

1. Identify the EL and non-EL scale score distributions on relevant content assessments. 

The two tables in Exhibit 23 below show the 25th, 50th, and 75th percentile ranks for non-EL 
and for EL students by ELP level for mathematics and ELA. Note that the ELs used to create 
this distribution have been part of EL programs for three years or less (i.e., only students in EL 
programs from first grade or later were used). The shaded cells highlight those points where 
scores are above the proficient level (proficient = 350) 



29. Determining how much academic growth is sufficient to be judged “on track” is a critical decision in employing 
this method. 
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Exhibit 23a. 

Distribution of Grade 3 Mathematics Scaies Score for ELs (by ELP Levei) and 

Non-ELs, in Education Agency 1 


Groups 


25th Percentile 


50th Percentile 


75th Percentile 


ELs Level 1 


245 


282 


336 


ELs Level 2 


287 


325 


377* 


ELs Level 3 


320 


364* 


407* 


ELs Level 4 


346 


377* 


437* 


ELs Level 5 


361* 


417* 


469* 


Non-ELs 


341 


399* 


451* 


Exhibit reads: Third-grade EL smdents at ELP level 1 with a mathematics scale score of 245 points were at 


the 25th percentile, whereas the 25th percentile score for non-EL students was 341 




* The shaded cells highlight those points where scores are above the proficient level (proficient = 350). 


Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Exhibit 23b. 

Distribution of Grade 3 ELA Scales Score for ELs (by level) and Non-ELs, 

in Education Agency 1. 


Groups 


25th Percentile 


50th Percentile 


75th Percentile 


ELs Level 1 


224 


242 


266 


ELs Level 2 


250 


272 


296 


ELs Level 3 


281 


307 


330 


ELs Level 4 


296 


326 


356* 


ELs Level 5 


315 


347 


384* 


Non-ELs 


311 


342 


377* 


Exhibit reads: Third-grade EL students at ELP level 1 with a ELA scale score 
25th percentile, whereas the 25th percentile score for non-EL students was 311 


of 224 points were at the 


* The shaded cells highlight those points where scores are above the proficient level (proficient = 350) 
Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



2. Determine a scale score to apply to the adjustment formula. That is, what percentile rank 
and corresponding scale score wiU be used to make the adjustment? For illustrative purposes, the 
75th percentile rank and its associated scale score wiU be used. 

3. Create a scale score adjustment factor^o using the following formula: 



30. A different and potentially more rigorous procedure might use linked and equated item response theory (IRT) 
theta values corresponding to the 75th percentile rank (if IRT is the measurement model employed) to make 
adjustments. Such information was not available in the current dataset. 
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Proficient Scale Score Value 
EL Scale Score Value at 7Sth percentile 

Note that for mathematics, only ELs at ELP level 1 have a scale score value less than 350 at the 
75th percentile. In this case the scale adjustment factor In mathematics will only apply to 
level 1 students. In ELA, ELs up to ELP level 3 will receive the scale score adjustment. 

4. Determine the appropriate timelines for EL students to receive adjustments (i.e., apply 
adjustments to students who are either “on track” in their ELP growth or apply what should be 
the value for their “on-track” ELP levels on the basis of time in the school system). For this 
analysis, the expected ELP growth timelines depicted in Exhibit 1 9 are used. 

5. Apply the scale score adjustment factor to EL students’ original scale scores to generate 
adjusted scale score values. 

The following example Illustrates how adjusted scores are generated and applied. Only ELP 
level 1 students who are in their initial year of EL program enrollment would receive a mathematics scale 
score adjustment (see Exhibit 23a). The adjustment factor for these students is 1.04 (350/336). If an 
ELP level 1 student in her initial year in an EL program received a mathematics scale score of 320, her 
adjusted score would be 333 (1.04 x 320). This would not be sufficient to be classified as proficient 
(meeting AYP). If a similar ELP level 1 student received a mathematics scale score of 338 (not 
proficient), her adjusted score would be 352 (1.04 x 338). This student would be identified as proficient 
for accountability purposes. 

In ELA, EL students up to ELP level 3 would receive adjustments because ELP level 4 students at the 
75th percentile have scores above the proficient cut point of 350 (see Exhibit 23b). To illustrate: An EL 
student in her second year in an EL program is at ELP level 2. Last year, she was also at level 2 (i.e., she 
did not advance in ELP level from last year to the year In consideration). She would not receive the ELP 
level 2 adjustment, rather, she would receive the level 3 adjustment, 1.06 (350/330). If she received an 
ELA scale score of 315, her adjusted score would be 334 (not proficient). To repeat: Only those EL 
students within expected ELP levels and time frames would receive scale score adjustments. The two 
tables in Exhibit 24 below show the scale score adjustments (shaded) by years In program and ELP level. 
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Exhibit 24a. 

ELP Levei Scaie Score Adjustment Factor to be Appiied to Grade 3 Mathematics Resuits 






Years in Program 




ELP Level 


0to1 


2 


3 


4 


1 


1 . 04 * 


1.00 


1.00 


1.00 


2 


1.00 


1.00 


1.00 


1.00 


3 


1.00 


1.00 


1.00 


1.00 


4 


1.00 


1.00 


1.00 


1.00 


5 


1.00 


1.00 


1.00 


1.00 


Exhibit reads : The scale score adjustment factor in grade 3 mathematics for EL students at ELP level 1 is 


1.04 (350/336). Given that the factors 
applied to any other EL students. 
Note: * Cells having a value above 1.00 


in all other cells in the table are LOO the adjustment would not be 
ire shaded. 


Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 





Exhibit 24b. 

ELP Level Scale Score Adjustment Factor to be Applied to 
Grade 3 English or Language Arts Results 






Years in Program 




ELP Level 


0to1 


2 


3 


4 


1 


1 . 32 * 


1 . 18 * 


1 . 06 * 


1.00 


2 


1 . 18 * 


1 . 06 * 


1.00 


1.00 


3 


1 . 06 * 


1.00 


1.00 


1.00 


4 


1.00 


1.00 


1.00 


1.00 


5 


1.00 


1.00 


1.00 


1.00 


Exhibit reads : The scale score adjustment factor in grade 3 ELA for EL students at ELP level 1 is 1 . 32 . Any 


other cells in the table with the value of LOO indicate that the adjustment would not be 
students in that cell. 

Note: * Cells having a value above 1.00 are shaded. 


applied to EL 


Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



1.b. Adjusted Counts Method 

The Adjusted Counts Method differs from the Adjusted Scale Score Method in that hoivEL students are 
counted is what is adjusted, not students’ scale scores. The usual percent proficient calculation on a 
content assessment is obtained by taking the total number of students receiving a proficient score and 
dividing that number by the total number of students participating (or who should be participating) on 
that assessment. The Adjusted Counts Method draws on empirical evidence showing that the likelihood 
of ELs at the lowest ELP levels reaching content proficiency is very small (see Exhibit 18). If the 
likelihood of attaining content proficiency is known, it can be used to adjust the way students are 
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counted in the denominator of the content proficiency calculation. The following steps describe how to 
apply this method. 

1 . Identify the likelihood of attaining content proficiency for ELs who are “on track” to 
meeting the English language proficient performance standard. Using logistic regression, 
probabilities are calculated, and Exhibit 25 below shows results from such an analysis. The 
exhibit displays the probability for ELs and non-ELs being proficient on content assessments of 
Education Agency 1. From this third-grade example, non-EL students have a 0.692 probability 
of being proficient in mathematics. They have a lower probability of being proficient in ELA 
(0.437). 



Exhibit 25. 

Probabiiity of Being Proficient on Grade 3 Content Assessment for ELs and Non-ELs, 

Education Agency 1 


Groups 


Mathematics 


ELA 


ELs: Level 1 


0.211 


0.016 


ELs: Level 2 


0.392 


0.034 


ELs: Level 3* 


0.629 


0.152 


ELs: Level 4 


0.753 


0.438 


ELs: Level 4* 


0.890 


0.786 


Non-ELs 


0.692 


0.437 


Exhibit reads: Based on logistic regression, the predicted probability for being proficient on grade 3 
mathematics is 0.21 1 for EL students at ELP level 1, and 0.692 for non-EL students. In the next column, 
both EL and non-EL smdents show a lower probability of being proficient in ELA. 

Notes: * A conjunctive minimum criterion is used in defining this agency’s English language proficient performance 

standard. That is, to be considered English proficient, ELs must attain a minimum composite score and 
minimum domain scores. Level 4* represents this agency’s EngHsh-Language-Proficient performance standard. 
Level y (representing a conjunctive minimum of level 3 composite, with all domains level 2 or higher) is used 
to increase the difference in probability between ELP Levels 2 and 3 for adjusting content proficiency 
expectations upward. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



As expected, ELs at lower ELP levels have lower probabilities of being proficient on academic 
subject tests. Interestingly, for this Education Agency at this grade level, students at the EngUsh- 
language-proficient performance standard (level 4*) have greater probabilities of being proficient 
in ELA and in math than non-ELs. 

2. Establish counting factors based on probability estimates. At this juncture in the process, 
deliberation with EL experts and stakeholders is critical to decide adjustment values for 
application. In the current example, adjustments for probability estimates in mathematics are 
employed by taking the obtained probability estimate rounded to the nearest decimal place: 
Level 1 = 0.2, Level 2 = 0.4, Level 3 = 0.6, and Level 4 (not meeting conjunctive minimum) 

= 0.8. Deciding adjustments for probability estimates at each ELP level in ELA, however, are 
more challenging because the change in probability from ELP levels 1 to 2 is quite modest, and 
from levels 2 to 3, levels 3 to 4, and level 4 to 4* is quite dramatic (see Exhibit 25). Rounding to 
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the nearest decimal place in ELA generates the following adjustments: Levels 1 and 2 = 0.0,^^ 
Level 3 = 0.2, and Level 4 = 0.4. For this example, therefore, sUghdy more rigorous adjustments 
are adopted: Level 1 =0.1, Level 2 = 0.2, Level 3 = 0.4, and Level 4 (not meeting conjunctive 
minimum) = 0.5. Ultimately, adjustment values, though informed by empirical methods, are 
policy decisions; accordingly, others may arrive at different values for equally defensible reasons. 

3. Determine the appropriate timelines for EL students to receive adjustments (i.e., apply 
adjustments to EL students who are “on track” in their ELP progress), while for those not on 
track, utiU 2 e the value that should be used were they at their “on-track” ELP levels based on 
time in the state school system. As above, for this analysis, the timelines outlined in Exhibit 19 
are used. 

4. Create count adjustment tables for calculating adjusted student weights. The tables in 
Exhibit 26a and 26b show adjustments (shaded) made by ELP level and time in program. 



Exhibit 26a. 

ELP Count Adjustment Values for Mathematics 




Years in Program 


ELP Level 


Oto 1 


2 


3 


4 


1 


0.20^ 


0.40^ 


0.60^ 


0.80^ 


2 


0.40^ 


0.60^ 


0.80^ 


1.00 


3 


0.60^ 


0.80^ 


1.00 


1.00 


4 


0.80^ 


1.00 


1.00 


1.00 


4*, 5 


1.00 


1.00 


1.00 


1.00 


Exhibit reads: The count adjustment value in mathematics for ELs at ELP level 1 and with 0 to 1 year in 
program was estimated to be 0.20. Any other cells in the table with the value of 1.00 indicate that the 
adjustment would not be applied to EL students in that cell. 


Notes: The obtained probability estimates were rounded to the nearest decimal place. 

4* represents this agency’s EngUsh-Language-Proficient performance standard, 
t Cells having a value above 1 .00 are shaded. 




Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



31. That is, this statistical technique yields values suggesting that there is virtually no probability of ELs at ELP 
levels 1 and 2 attaining proficiency on the state’s ELA assessment. Indeed, this fact may lead some policy analysts to 
advocate the indexed progress method, discussed below. 
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Exhibit 26b. 

ELP Count Adjustment Values for English or Language Arts 




Years in Program 


ELP Level 


0to1 


2 


3 


4 


1 


0.10^ 


0.20^ 


0.40^ 


0.50^ 


2 


0.20^ 


0.40^ 


0.50^ 


1.00 


3 


0.40^ 


0.50^ 


1.00 


1.00 


4 


0.50^ 


1.00 


1.00 


1.00 


4*, 5 


1.00 


1.00 


1.00 


1.00 


Exhibit reads: The count adjustment value in ELA for ELs at ELP level 1 and with 0 to 1 year in program 
was estimated to be 0.10. Any other cells in the table with the value of 1.00 indicate that the adjustment 
would not be applied to EL students in that cell. 


Notes: The obtained probability estimates were rounded to the nearest decimal place. 

4* represents this agency’s EngUsh-Language-Proficient performance standard, 
t Cells having a value above 1 .00 are shaded. 






Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 





5. Apply count adjustment values to calculate EL progress-to-proficiency trajectory results. 

To calculate results for the EL subgroup, 32 the Adjusted Count formula would be 

Number of [Eligible Former EL + Current EL] Students Proficient on Assessment 
Number of Eligible Former EL Students + 2 Count Adjustment Values for Current ELs 

To calculate results for current ELs only, the Adjusted Count formula would be 

Number of Current EL Students Proficient on Assessment 
2 Count Adjustment Values for Current ELs 

6. Calculate EL progress-to-proficiency trajectory results based on one of the above 
formulas. As a simple example, imagine that a school has 40 third-grade language-minority 
students, of whom 30 are current ELs and 10 are former (exited) ELs. 33 On this year’s ELA 
assessment, eight of the 10 former-EL students scored proficient, while five of the 30 current 
ELs received a proficient score. What is the result for the EL subgroup? Without adjustment, 

32.5 percent (13/40) of third-grade EL subgroup students in this school are deemed proficient 
in ELA. Of the 30 current ELs, however, 10 are new students this year and are at ELP level 1. 
Another 1 0 are ELs who have been in an EL program for two years and are at ELP level 2, and 
the third 1 0 are ELP level 4 students who have been in an EL program for three years. When the 
count adjustments are applied to these students (Level 1 student = 0.1, Level 2 student = 0.4, 
and Level 4 student = 1.0), the percentage of the EL subgroup meeting the expected 
performance standard changes from 32.5 percent to 52 percent: 



32. Under current federal regulation, former ELs may be included in EL subgroup results for the two years 
following their exit from EL status. 

33. For this example, we assume all former EL students exited EL status within the past two years. 
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8 + 5 

10 + I[(10 * .1) + (10 * .4) + (10 * 1.0)] 



Although this hypothetical example includes eligible former ELs in the calculation, results also 
can be calculated to examine outcomes for current ELs only in order to illustrate more clearly 
the effect of the method’s application. Without adjustment, the results for this school’s 
third-grade current EL students would be 16.7 percent (5/30). Applying the Adjusted Count 
formula using the results and characteristics of current ELs described above, the percentage of 
current ELs meeting the expected performance standard changes from 16.7 percent to 
33.3 percent: 



5 

2[(10 * .1) + (10 * .4) + (10 * 1.0)] 

In both cases — for the EL subgroup including eligible former ELs, and for current ELs only — 
the adjustments increase the overall percentage meeting the expected performance standard 
because ELP levels 1 and 2 students are counted using an ELP level/ time- sensitive weight 
compared with former-EL students or current ELs at higher ELP levels or in EL programs for 
extended time periods. 

The ELP expected growth timelines (Exhibit 1 9) and the scale score or count adjustment tables 
(Exhibits 24a and 24b, or Exhibits 26a and 26b, respectively) were used in applying the two progressive 
benchmarking methods to Education Agency 1 ’s third-grade EL data. Exhibit 27 presents outcomes of 
applying these methods to results of current EL students in third grade in order to give a sense of their 
effect. 



Exhibit 27. 




Content Proficiency Outcome Comparisons of Progressive Benchmarking Methods, for 
Engiish Learners in Grade 3 (N = 18,101), Education Agency 1 


Method 


Percent Proficient 


Mathematics Proficiency (no method applied) 


39.3% 


1 .a. Mathematics Proficiency using Scale Score Adjustments 


39.4% 


l.b. Mathematics Proficiency using Count Adjustments 


42.0% 


ELA Proficiency (no method applied) 


6.3% 


1 .a. ELA Proficiency using Scale Score Adjustments 


7.6% 


1 .b. ELA Proficiency using Count Adjustments 


7.0% 


Exhibit reads: For EL students in third grade, 39.3 percent scored proficient or above in mathematics 
without adjustments, and progressive benchmarking methods 1 and 2 were applied higher proportions of EL 
students were classified as proficient (39.4 percent and 42.0 percent, respectively). 


Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



As expected, both progressive benchmarking methods yield higher proportions of EL students classified 
as proficient (meeting AYP). The scale score adjustment method shows the smallest percentage point 
increase in mathematics (0.1 percentage points) and the largest in ELA (1.3 percentage points). The 
student count adjustment method yields an increase of 2.7 percentage points for mathematics and 
0.7 percentage points for ELA. 
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Method 2 (Indexed Progress) 

In the worked example, this method is applied only to ELA. The intent is to use gain in ELP as a 
limited-term proxy for emerging performance in ELA. The following steps illustrate the application of 
this method. 

1 . Identify the progress model to be used to define ELP growth. As previously noted, there 
has been much recent state experimentation with progress and growth models, especially relating 
to AYP. Also, some exploratory research has been conducted regarding growth models and ELs 
(e.g.. Cook and others 2008; Cook and Zhao 2011). Several growth-based procedures are 
available (e.g., simple gain-based models, percentile growth charts, student growth percentiles, 
value tables, value-added models). The model applied here is percentile growth charts, which 
calculate composite scale score gains and rank them by ELP level. The purpose is to identify the 
distribution of growth scores for each ELP level. In Exhibit 28, the box plot displays composite 
ELP assessment scale score gains for ELs from second to third grade in Education Agency 1. 



Exhibit 28. 

Composite ELP Assessment Scaie Score Gains for ELs From Second to Third Grade 

(2007-08), Education Agency 1 




ELP performance levels 

for grade 2 (in scale score points) 

Level 1 = Beginning (215—396) 
Level 2 = Early Intermediate 
(397-446) 

Level 3 = Intermediate 
(447-495) 

Level 4 = Early Advanced 
(496-539) 

Level 5 = Advanced (540-635) 



Exhibit reads: The box plots show a tendency that as students’ ELP level increases, and the median and 
average ELP assessment growth values decrease. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Note that as a student’s ELP level increases, the median and average ELP assessment growth 
values decrease. (Recall this pattern was also observed earlier in Exhibit 21.) This suggests that 
states might well need to set different growth expectations for EL students at different ELP 
levels. 

2. Once an ELP progress model is adopted, establish expectations for “acceptable” ELP 
growth. In prior worked examples, the 75th percentile level was adopted as a criterion. For 
consistency, it is adopted for this approach as well. Using values from Exhibit 21, this education 
agency could posit that to meet ELP growth expectations, EL students must demonstrate the 
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following composite scale score gains (in scale score units): ELP Level 1 = 182, ELP Level 2 = 
69, ELP Level 3 = 48, and ELP Level 4 = 2144 

3. Determine the time frame for EL students to receive adjustments (i.e., apply adjustments 
to students who are “on track” in their ELP development or what adjustment should be applied 
were they at their “on-track” ELP levels by time in the state school system). As with prior 
methods, the expected time frames outlined in Exhibit 1 9 are used. 

4. Create an Indexed Progress Gain Table, from which acceptable ELP scale score gain 
can be determined for ELA proxy purposes. A sample table is shown below in Exhibit 29. 
The expectations for ELP progress by year in the state school system are similar to previous 
methods. ELP level 1 students in their initial year are expected to make the greatest ELP gains. 
Level 4 students in their initial year are expected to make the smallest ELP gains. The table cells 
with dashes are not part of the Indexed Progress Method (i.e., ELP growth is not used as a 
proxy for ELA performance). For EL students represented by these cells with dashes, their ELA 
assessment result is used. 



Exhibit 29. 

indexed Progress Gain Vaiues (in ELP Assessment Composite Scaie Score Units) 
as Proxy for Engiish or Language Arts, by Student ELP Levei 
and Years in State Schooi System 




Years in Program 


ELP Level 


0to1 


2 


3 


4 


1 


182 


69 


48 


21 


2 


69 


48 


21 


— 


3 


48 


21 


— 


— 


4 


21 


— 


— 


— 


5 


— 


— 


— 


— 


Exhibit reads: The indexed progress gain value estimated for ELs at ELP level 1 and with 0 to 1 


year in 


program was 182 ELP assessment composite scale score units. 








Note: This table was based on 75th percentile values from Exhibit 21 and appropriate timelines for EL students to 
receive adjustments from Exhibit 19. Thus if an EL student who started at ELP level 1 has been in an EL program for 
two years, a value of 69 Index progress gain score would be used. If the student’s ELP scale score gain value was equal 
to or higher than that value of 69, the student would be counted as meeting the ELP performance standard for 
accountability calculations. 


Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 





5. Apply indexed progress results as proxy for ELA assessment performance for eligible 

ELs. The ELP levels and timelines as displayed in Exhibit 29 are applied for these EL students. 
If a student is at or exceeds the expected ELP scale score gain value, she is counted as meeting 
the expected ELA performance standard (meeting AYP) for accountability calculations. 

Exhibit 30 below shows the difference in ELA percent proficient outcomes with and without 
the Indexed Progress method applied to third-grade ELs. 



34. These values correspond to the value of the top line of each box plot In Exhibit 27. 
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Exhibit 30. 

ELA Proficiency Outcome With and Without indexed Progress Method Appiied, 
for ELs in Grade 3 (N = 18,101), Education Agency 1 



Method 


Percent Proficient 


ELA Proficiency (no method appiied) 


6.3% 


2. ELA Proficiency using Indexed Progress 


1 7.4% 



Exhibit reads: Without an adjustment, 6.3 percent of EL students at grade 3 scored proficient or above in 
the ELA performance assessment. After applying the indexed progress method, 17.4 percent of EL students 
were counted as being proficient in ELA. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



The Indexed Progress Method identifies much greater proportions of third-grade EL students as 
meeting the ELA performance standard (being “proficient”) because it places greater value on the ELP 
progress of more recently-arrived ELs at lower ELP levels,as a temporary proxy for ELA performance. 
The percentage point increase in the ELA outcome is much greater (11.1 percentage points) than that 
observed for the two Progressive Benchmarking methods above. 

Method 3 (Status and Growth Accountability Matrix) 

As previously noted, this approach does not use an EL students’ ELP level as a mediating factor. This 
method assumes that either growth in content performance or attainment of content proficiency is 
sufficient for accountability for all students, including ELs. This approach assumes a growth model is 
applied to content area assessments. Its application requires the following steps. 

1 . Identify an appropriate growth model. As mentioned earlier, there is much current 
experimentation with different types of gain or growth models. Generally, states currently use 
three types of growth models: value tables, student growth percentiles, or value-added models. 
Detailed descriptions of these models are beyond the scope of this chapter. 

2. Apply a growth model to all students. This example uses student growth percentiles (SGP) as 
applied by Betebenner (2008). The SGP tool was used to calculate growth percentile results for 
all students in our sample education agency’s second-to-third-grade dataset. 

3. Establish acceptable growth values. The SGP tool provides growth percentile scores for all 
students with sufficient data. Following previous examples, a student growth percentile of 75 
was adopted as the growth criterion for this example. 

4. Create a status and growth accountability matrix (SGAM). Exhibit 31 shows the matrix used in 
this example. 



35. See, for example, Auty and others (2008) and Betebenner (2009). 
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Exhibit 31. 

Status and Growth Accountabiiity Matrix 



Status on Content 
Assessment 




Growth on Content Assessment 


Low Growth 


High Growth 


Proficient or Above on 
Content Assessment 


1 

Content Scale Score > 350 and 
Student Growth Percentile < 75 


II 

Content Scale Score > 350 and 
Student Growth Percentile > 75 


Not Proficient on 
Content Assessment 


III 

Content Scale Score < 350 and 
Student Growth Percentile < 75 


IV 

Content Scale Score < 350 and 
Student Growth Percentile > 75 



Exhibit reads: ELs in quadrant I scored proficient or above on the content assessment but exhibited 
growth of the content assessment that was lower than the growth recorded by the fastest growing 25 percent 
of ELs. 



5. Determine weights to calculate the percentage of students meeting the status and 

growth criteria. In this example, if a student’s status and growth fell within quadrants I, II, or 
IV, he received a weight of 1 . If a student status and growth fell within quadrant III, he was 
assigned a weight of 0. Accountability percentages are calculated as the sum of students receiving 
a 1 divided by the total number of students. Exhibit 32 shows differences in outcomes obtained 
by applying this method to aU third-grade students’ results for mathematics and ELA. 
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Exhibit 32. 




Comparison of Content Proficiency Outcomes With and Without the 
Status and Growth Accountabiiity Matrix (SGAM) Method Appiied, 
for Aii Grade 3 Students (N = 48,394), Education Agency 1 


Method 


Percent Proficient 


Mathematics Proficiency (no method applied) 


59.4% 


3. Mathematics Proficiency using SGAM Method 


61.6% 


ELA Proficiency (no method applied) 


29.9% 


3. ELA Proficiency using SGAM Method 


39.3% 


Exhibit reads: For all grade 3 students, including both ELs and non-ELs, 59.4 percent scored proficient or 
above in mathematics without an adjustment. After applying the SGAM method, 61.6 percent were 


proficient or above (an increase of 2.2 percentage points). 




Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



The SGAM method yields higher proportions of students identified as making acceptable growth or 
attaining content proficiency. Since the results shown are for all students, Exhibit 33 disaggregates 
outcomes for ELs versus non-ELs. 



Exhibit 33. 

Comparison of Content Proficiency Outcomes With and Without the Status and Growth 
Accountability Matrix (SGAM) Method Applied, for EL and Non-EL Students 

in Grade 3, Education Agency 1 


Group 


N 


Method 


Percent Proficient 


Non-EL 


30293 


Mathematics Proficiency (no method applied) 


71 .2% 


Non-EL 


30293 


3. Mathematics Proficiency using SGAM Method 


72.5% 


Non-EL 


30293 


ELA Proficiency (no method applied) 


44.0% 


Non-EL 


30293 


3. ELA Proficiency using SGAM Method 


50.5% 


EL 


18101 


Mathematics Proficiency (no method applied) 


39.3% 


EL 


18101 


3. Mathematics Proficiency using SGAM Method 


43.4% 


EL 


18101 


ELA Proficiency (no method applied) 


6.3% 


EL 


18101 


3. ELA Proficiency using SGAM Method 


20.7% 


Exhibit reads: With no adjustment, 71.2 percent of grade 3 non-EL students 


were proficient in 


mathematics, and with the SGAM method applied, 72.5 percent were proficient (an increase of 


1.3 percentage points). 






Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



As Exhibit 33 illustrates, 71.2 percent of grade 3 non-EL students were proficient in mathematics with 
no adjustment, and when the SGAM method is applied, 72.5 percent were so, an increase of 
1.3 percentage points. For third-grade ELs, 39.3 percent were proficient in mathematics with no 
adjustment, and under SGAM, 43.4 percent were so, an increase of 4.1 percentage points. ELs benefit 
more than non-ELs in mathematics with this procedure. A substantially higher percentage increase for 
ELs occurs when the method is applied to ELA. EL percent proficient results increase by 
14.4 percentage points (from 6.3 percent to 20.7 percent) when SGAM is applied, versus 6.5 percent for 
non-ELs. 
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Comparison of Method Outcomes 



The methods illustrated above yield varying differences in third-grade outcomes at the Education 
Agency level. While some methods appear to have litde effect on EL results at grade 3, greater effects 
might be observed when all tested grades within a school are calculated and combined. In fact, there are 
notable differences in outcomes at the school level, depending upon the salient characteristics of ELs. 
Given that these methods set out to take into account newer, low-ELP-level EL students’ academic 
performance in rigorous, meaningful ways (e.g., by ELP level, time in the school system, growth in ELP 
or academics), the following exhibits show each method’s effects on third-grade current EL 

outcomes, disaggregated by differing densities of new ELs (i.e., ELs with three years or less in an EL 
program) for mathematics and ELA. In these exhibits, the term “Low” characterizes schools where less 
than 5.5 percent (the 25th percentile) of their third-grade ELs would be classified as new and “High” 
characterizes schools where 15.6 percent (the 75th percentile) or more of their third-grade ELs would be 
classified as new. 



Exhibit 34. 

Method Outcome Comparisons for ELs (N = 18,101) in Mathematics at Grade 3, 
by Density of New ELs in Schoois, Education Agency 1 



Method 


Mean Percent Proficient in Mathematics 




Schools Clustered by Density of New ELs 


All Schools 
(N=458) 


Low 

(N=115) 


Moderate 

(N=230) 


High 

(N=113) 


No method applied 


Mean 


47% 


40% 


47% 


43% 


Std 


0.24 


0.15 


0.21 


0.20 


1 .a. ELP Level Adjusted Scale Score Method 


Mean 


47% 


40% 


47% 


43% 


Std 


0.24 


0.15 


0.21 


0.20 


1 .b. ELP Level Adjusted Count Method 


Mean 


48% 


42% 


56% 


47% 


Std 


0.25 


0.16 


0.30 


0.23 


3. Status and Growth Accountability Matrix Method 


Mean 


51% 


44% 


49% 


47% 


Std 


0.24 


0.15 


0.21 


0.20 



Exhibit reads: When no adjustment methods were applied 47 percent of third-grade ELs in schools with 
low densities of new ELs were proficient in mathematics. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



The third-grade sample is drawn from 458 schools in Education Agency 1. Across aU schools, the 
average percent proficient in mathematics for third-grade ELs is 43 percent (see the first row of the “All 
Schools” column). Schools with low densities of new ELs show on average 47 percent of their EL 
students are proficient in mathematics. Schools with moderate densities of new ELs averaged 40 percent 
proficient, and schools with high densities of new ELs averaged 47 percent proficient. There are no 
outcome differences across schools between the unadjusted (“no method applied”) and the adjusted 
scale score progressive benchmarking method. The adjusted count progressive benchmarking method 
shows a 1 percentage point increase among schools with low densities of new ELs, a 2 percentage point 
increase among schools with moderate densities, and a 9 percentage point increase among schools with 
high densities of new ELs. The SGAM method shows a 4 percentage point increase among schools with 
low and moderate densities and a 2 percentage point increase among schools with high densities of new 
ELs. Exhibit 35 compares outcomes of the various methods applied to ELA results. 
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Exhibit 35. 

Method Outcome Comparisons for ELs (N = 18,101) in Engiish or Language Arts at Grade 
3, by Density of New ELs in Schoois, Education Agency 1 



Method 


Mean Percent Proficient in Engiish Language Arts 




Schools Clustered by Density of New ELs 


AM Schoois 
(N=458) 


Low 

(N=115) 


Moderate 

(N=230) 


High 

(N=113) 


No method applied 


Mean 


10% 


7% 


9% 


8% 


Std 


0.17 


0.07 


0.11 


0.11 


1 .a. ELP Level Adjusted Scale Score Method 


Mean 


11% 


8% 


14% 


10% 


Std 


0.17 


0.08 


0.14 


0.12 


1 .b. ELP Level Adjusted Count Method 


Mean 


12% 


7% 


12% 


10% 


Std 


0.23 


0.08 


0.16 


0.15 


2. ELP Indexed Progress Method 


Mean 


22% 


17% 


22% 


20% 


Std 


0.21 


0.10 


0.15 


0.15 


3. Status and Growth Accountability Matrix Method 


Mean 


23% 


21% 


22% 


22% 


Std 


0.19 


0.11 


0.16 


0.15 



Exhibit reads: When no adjustment methods were applied, 10 percent of third-grade ELs in schools with 
low densities of new ELs were proficient in ELA. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Across aU schools, 8 percent of third-grade EL students scored proficient in ELA. Among low- and 
high-density new EL schools, the percentage of ELs scoring proficient in ELA increased to 1 0 percent 
and 9 percent, respectively, while 7 percent were proficient in moderate-density schools. When the 
adjusted scale score progressive benchmarking method is applied, the percentage of ELs deemed 
proficient among low-, moderate-, and high-density new EL schools increased by 1, 1, and 5 percentage 
points, respectively. The adjusted count progressive benchmarking method increased outcomes by 2 and 
3 percentage points for low- and high-density schools, respectively, but no change was observed among 
moderate density schools. The indexed progress method (utilized only on ELA), yielded increases of 12, 
10, and 13 percentage points among schools with low-, moderate-, and high-densities of new ELs, 
respectively. This represents a much larger change in outcomes compared with the progressive 
benchmarking methods. Finally, the SGAM method yielded increases of 13, 14, and 13 percentage points 
among schools with low-, moderate-, and high-densities of new ELs, respectively. 

Summary and Caveats 

Of the different methods explored, the Status and Growth Accountability Matrix generated the greatest 
percentage point differences in the results of third-grade EL students. This model does not adjust for 
English language proficiency level, however. Also all students, not just ELs, were used to generate the 
student growth percentile values which affected school results, although the method did yield a greater 
change in outcomes for ELs than for non-ELs. Finally, the worked example above could not model 
whether EL student academic growth at the 75th percentile (the standard for being considered “high 
growth”) — though quite rigorous — would be sufficient to ensure that such EL students are “on track” to 
attain academic proficiency in a reasonable time frame. The Indexed Progress method yielded the next 
highest range of percentage point differences, though this method was applied only to ELA results of 
the restricted subset of eligible EL students. It is unclear how this method would influence mathematics 
or ELA proficiency outcomes if some type of composite (e.g., ELP growth + mathematics achievement 
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or ELP growth + ELA achievement) were utili 2 ed. The Progressive Benchmarking methods generated 
the least changes in outcomes. Of the two Progressive Benchmarking methods explored, the Adjusted 
Counts method yielded greater differences, compared with the Adjusted Scale Scores method, although 
these were still very modest differences (1 to 2 percentage points) in the low- and moderate-density new- 
EL schools, and 3 to 5 percentage points in the high-density new-EL schools. The change in outcomes 
under these methods was likely modest because many of the eligible third-grade EL students in 
Education Agency 1 to which these methods were applied had “timed out” of the possibility to benefit 
from such adjustments due to their higher initial ELP level or lack of expected ELP progress by time in 
the school system. 

As evidenced above, implementing any of these methods requires important decisions. For example, the 
75th percentile was chosen as a guiding criterion. Such decisions are more policy related than empirical. 
The 75th percentile was chosen here to establish more rigorous expectations and to provide some degree 
of consistency for method comparisons. Also, an ambitious time frame was utili 2 ed for expected 
progress to the English-language proficient performance standard (i.e., four years of those initial ELP- 
level 1 ELs, with proportionally less time for those beginning at higher initial ELP levels). Other criteria 
could very well provide different scenarios and would certainly generate different results. As noted 
throughout this report, expert stakeholders must understand assumptions made within methods, how 
empirical data are used, and be presented with policy options in making decisions. There will be 
differential impacts resulting from the chosen method and criteria, and experts must be informed to 
support decisionmaking with a focus on ELs. 

There are numerous caveats. First, the methods illustrated in this chapter are exploratory and meant to 
spur discussion and foster further research. They are by no means definitive and should not be viewed as 
such. Second, the outcomes generated were based only on one grade in one education agency. Other 
grades and different populations of students from other educational agencies might yield different 
results. Generating results for all grades tested was beyond the scope of this chapter, and may very well 
alter the outcomes for the education agency featured in the worked examples. Moreover, other education 
agencies with different characteristics may well have different findings. Any method explored should be 
applied to all grades for which data are available. Third, employing different criteria will likely result in 
different findings. A state may choose to examine these methods using optimal criteria. For example, 
such an approach might examine external criteria regarding school or teacher quality, and each method 
could then be compared using these schools. Analyses could be conducted to determine which method 
most often identifies schools with high-quality instruction. Fourth, the consequences of implementing 
these models must be considered. Not only should outcome data be generated, but careful consideration 
also should be paid to a method’s degree of intuitive appeal, comprehension, and perceived fairness, 
particularly among key stakeholder groups. This should be explored before adopting any method. Fifth, 
substantial statistical and database capacity is required to implement the methods described in this 
chapter, particularly for the Status and Growth Accountability Matrix method. Finally, the ultimate goal 
behind these methods is to more accurately determine and represent how ELs are performing on state 
content assessments, assessments which in many cases are not designed for low-ELP-level ELs. The 
methods presented here yield results that are influenced by the ELP and content assessments employed 
by this education agency, for its EL population. Differences in assessments, performance standards, and 
EL students will all affect how these methods function and the results each generates. 

Nevertheless, the empirical methods explored in this chapter can help policymakers begin to address key 
factors highlighted throughout this report that shape assessment and accountability for English Learners: 
Namely, an EL’s level of English language proficiency fundamentally affects their academic performance 
on assessments conducted in English; it takes ELs time to develop levels of ELP needed to benefit from 
instmction in English and perform on these assessments; this time frame varies based on several factors. 
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including students’ level of initial English proficiency, age and grade on entry, and the quality of 
instructional services they receive relative to their linguistic and academic needs; rigorous yet reasonable 
time-based expectations for ELs to learn English for academic performance can be set and monitored; 
and expectations for ELs’ progress toward academic proficiency can incorporate and reflect those 
rigorous EngUsh-learning expectations. 

While methods explored in this chapter are complex, they efficientiy capture a complex reality facing EL 
students and their educators. As such, they offer a start to help state decisionmakers develop 
expectations that establish rigorous accountability and that address a fundamental need for fairness and 
realism. Such methods may also stimulate discussion and experimentation among education agencies to 
develop more nuanced ways of measuring and evaluating ELs’ academic progress and proficiency in 
relationship to their initial ELP levels and time in the school system. 
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Appendix A. Decision Consistency Method 



Adapted from several areas of measurement research (e.g., bias, reliability, and standard setting), this 
decision-theory approach examines the relationship between two sets of measures, in this case, academic 
content assessments and ELP assessments (Cook et al., 2009) (see Exhibit A.l for a display of this 
relationship). 



Exhibit A.1. 

State ELP and Academic Content Assessment Decision Matrix 




State ELP Assessment Cut-Score (TBD) 


Not Proficient (TBD) 


Proficient (TBD) 


State Content 
Assessment Proficient 
Cut Score (Given) 


Proficient 


Proficient on content 
Below language proficient 
(Quadrant I)* 


Proficient on content 
Proficient In language 
(Quadrant II) 


State Content 
Assessment Proficient 
Cut Score (Given) 


Not Proficient 


Below content proficient 
Below language proficient 
(Quadrant III) 


Below content proficient 
Proficient In language 
(Quadrant IV)* 


* Cells in gray are defined as inconsistent decisions. 



The four cells characterize outcomes of decisions made with state content and ELP assessments. Cells in 
gray are defined as inconsistent decisions. Students classified in quadrant I scored proficient on the 
content assessment but less than proficient on the ELP assessment. Those classified in quadrant IV 
scored proficient on the ELP assessment but less than proficient on the given content assessment. The 
other cells are defined as consistent decisions based on results of both measures. 

Assuming that a state academic content assessment’s proficient performance standard is determined and 
fixed, this method allows users to identify the ELP assessment score band or value that maximizes the 
proportion of consistent decisions, thereby identifying a possible ELP performance standard for 
consideration. A decision consistency score or index value can be created for each individual score or 
score band as follows: 

quadrant II + quadrant III 

Decision consistency (DC) = ; 

sum of quadrants I to IV 

Using the above formula, DC values could be plotted across the ELP score band or values creating a 
graphic view of the change in decision consistency. Steps in creating the graph would be as follows: 

1. Determine a protocol for delineating ELP levels/ score bands (e.g., how many bands, how to 
subdivide them) with a separate band established for state’s current ELP performance standard. 

2. Select grades and/ or grade bands to compare. 

Calculate a DC score for each ELP level or band and plot. 

If the hypothesis holds regarding the decreasing relationship between English language proficiency and 
academic content performance after a certain ELP level is reached, then there should be an observable 
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decrease in the DC values on this graph. The point at which this decrease is observed is suggestive of 
where the ELP performance standard might be considered. Typically, this relationship varies by subject 
area and grade or grade span; so the method is expected to give not a single, definitive result but rather a 
performance range for consideration. An example follows. 

Assume the information in Exhibit A.2 represents ELP and (ELA) assessment information from a state. 
This fictitious state has five ELP levels: Entering, Beginning, Intermediate, Advanced, and Bridging. To 
calculate decision consistency, the study team will follow the steps mentioned earlier. First, the five ELP 
levels are subdivided into ten bands. Each proficiency level band has a “low” or “high” designator. In 
this state, each composite proficiency level has a scale score range. The scale score point at the midpoint 
of each proficiency level range is then demarcated. Students below the midpoint are classified as “low.” 
Students at or above that point, are classified as “high.” The first column in Exhibit A.2 shows the ten 
composite ELP bands created. The next two columns present the numbers of students “Not Proficient” 
or “Proficient” on the state’s ELA assessment for each language proficiency level band. For example in 
the Advanced High band, 331 students were not proficient on the state’s ELA assessment, and 178 were 
proficient. The last column lists the decision consistency percentage. 



Exhibit A.2. 

Example Decision Consistency Table, Grade 5 English or Language Arts 


Composite ELP Level 
Bands 


English Language Arts Assessment 


Decision Consistency 
Percentage 


Number Not Proficient 


Number Proficient 


Entering Low 


4 


0* 


37% 


Entering High 


44 


2* 


37% 


Beginning Low 


61 


0* 


39% 


Beginning High 


114 


6* 


41% 


Intermediate Low 


124 


13* 


46% 


Intermediate High 


327 


57* 


51% 


Advanced Low 


245 


81* 


63% 


Advanced High 


331* 


178 


70% 


Bridging Low 


134* 


249 


77% 


Bridging High 


64* 


257 


72% 


Totai 


1,448 


843 




Notes: Total n = 2,291. Data are hypothetical. 

* Shading in the two English or Language Arts assessment columns illustrates values used to calculate the 
decision consistency percentage for Advanced High level. 



Next, the decision consistency percentage is calculated for each proficiency level band. Keeping with the 
Advanced High as an example, students below the Advanced High band and not proficient on the ELA 
assessment are in quadrant III (i.e., classified below proficient on both assessments). Students at or 
above Advanced High and proficient on the ELA assessment are in quadrant II (classified above 
proficient on both assessments). Neither group is shaded in Exhibit A.2, consistent with Exhibit A.l. 
Note that the decision consistency approach is designed to support decisions about “language 
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proficiency.” Claiming students at or above Advanced High are proficient is somewhat of a misnomer. It 
should more appropriately be termed, “if language proficiency were set at Advanced High or higher.” 
Decision consistency at this point is 70 percent and is calculated as follows: 

(4 + 44 + 61 + 114 + 124 + 327 + 245) + (178 + 249 + 257) 

= 2:29! ■ 

That is, if Advanced High were the language proficiency level, 70 percent of students would be 
“consistently classified.” Students in shaded areas would be classified inconsistently. Plotting decision 
consistency percentages for all bands yields the following graph (see Exhibit A.3). 



Exhibit A.3. 

Example ELP and English or Language Arts Decision Consistency Graph 




Gracto S EL Studenls 



Exhibit reads: The cumulative rate of consistent decisions obtained for fifth-grade EL students in English 
language arts up through the bridging low ELP performance level is 77 percent. 

Note: Exhibit A.2 presents the number of students with ELP and English language arts proficiency level scores and 

the percentage of consistent decisions. 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Note that between the Bridging Low and Bridging High bands, the values decrease. That is, where the 
English language proficient performance standard set at the Bridging High band, fewer consistent 
decisions would be made relative to the ELA assessment, than at the Bridging Low band. The band 
where decision consistency is at its highest is where deliberations about English-language proficiency 
should begin. 
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Appendix B: 
Education Agency 1 



Appendix B. Education Agency 1 



Exhibit B.1. 

Education Agency 1, Grade 4: Decision Consistency Anaiysis, 
Logistic Piot, and Box Piot (2006-07) 




Grade 4, 2007 ELA Logialio Plot Grade 4, 2007 Math Logistie Plot 



• • « • • Grade 4 ELA 
— — Grade 4 Matt" 



Estimated Estimated 

Probability Probability 




230 330 430 530 630 730 230 330 430 530 630 730 

ELPCompositeScaleScoreO? ELPCompositeScaleScoreO? 




Note: The corresponding data tables for Exhibit B.l are Exhibits B.3 and B.4 (for the line graphs), Exhibit B.7 (for 

the logistic plots), and Exhibit B.9 (for the box plots). 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Exhibit B.3. 

Education Agency 1, Grade 4: ELP and Engiish or Language Arts 
Decision Consistency Anaiysis (2006-07) 


ELP Level 


Number of Students With ELA Scores 


Percent of Consistent 
Decisions 


Not Proficient ELA 


Proficient ELA 


Beginning Low 


31 


0 


25% 


Beginning High 


128 


10 


26% 


Early Intermediate Low 


84 


0 


32% 


Early Intermediate High 


211 


5 


36% 


Intermediate Low 


486 


60 


45% 


Intermediate High 


487 


200 


65% 


Early Advanced Low 


128 


122 


78% 


Early Advanced High 


48 


96 


79% 


Advanced High 


14 


34 


76% 


Advanced Low 


0 


2 


75% 


N (%) 


1,617 (75%) 


529 (25%) 




Note: Total n — 2,146 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Exhibit B.4. 

Education Agency 1, Grade 4: ELP and Mathematics 
Decision Consistency Anaiysis (2006-07) 


ELP Level 


Number of Students With Math Scores 


Percent of Consistent 
Decisions 


Not Proficient Math 


Proficient Math 


Beginning Low 


27 


4 


48% 


Beginning High 


110 


28 


49% 


Early Intermediate Low 


78 


6 


53% 


Early Intermediate High 


178 


38 


56% 


Intermediate Low 


318 


226 


63% 


Intermediate High 


297 


389 


67% 


Early Advanced Low 


79 


171 


63% 


Early Advanced High 


28 


116 


58% 


Advanced High 


5 


43 


54% 


Advanced Low 


0 


2 


52% 


N (%) 


1,120 (52%) 


1 ,023 (48%) 




Note: Total n — 2,143 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Exhibit B.5. 

Education Agency 1, Grade 4: ELP and Engiish or Language Arts 
Decision Consistency Anaiysis (2007-08) 


ELP Level 


Number of Students With ELA Scores 


Percent of Consistent 
Decisions 


Not Proficient ELA 


Proficient ELA 


Beginning Low 


17 


1 


27% 


Beginning High 


52 


1 


28% 


Early Intermediate Low 


61 


0 


31% 


Early Intermediate High 


153 


5 


34% 


Intermediate Low 


372 


37 


43% 


Intermediate High 


418 


142 


62% 


Early Advanced Low 


140 


132 


78% 


Early Advanced High 


42 


108 


78% 


Advanced High 


26 


47 


74% 


Advanced Low 


0 


0 


73% 


N (%) 


1,281 (73%) 


473 (27%) 




Note: Total n — 1,754 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Exhibit B.6. 

Education Agency 1, Grade 4: ELP and Mathematics 
Decision Consistency Anaiysis (2007-08) 


ELP Level 


Number of Students With Math Scores 


Percent of Consistent 
Decisions 


Not Proficient Math 


Proficient Math 


Beginning Low 


12 


4 


54% 


Beginning High 


39 


15 


55% 


Early Intermediate Low 


49 


13 


56% 


Early Intermediate High 


111 


49 


58% 


Intermediate Low 


248 


163 


62% 


Intermediate High 


239 


327 


66% 


Early Advanced Low 


74 


198 


61% 


Early Advanced High 


24 


128 


54% 


Advanced High 


12 


61 


49% 


Advanced Low 


0 


0 


46% 


N (%) 


808 (46%) 


958 (54%) 




Note: Total n — 1,766 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Exhibit B.7. 

Education Agency 1, Grade 4: Logistic Regression Resuits on 
Engiish or Language Arts and Mathematics Proficiency (2006-07) 



Subject of Content 
Assessment 


Parameter 


Estimate 


Standard Error 


Wald X^(1) 


Pr>X^ 


ELA 


Intercept 


-18.98 


1.05 


329.93 


<.0001 


ELP Reading 
Scale Score 


0.04 


0.00 


301.81 


<.0001 


N 


2,146 




Math 


Intercept 


-10.69 


0.69 


236.95 


<.0001 


ELP Reading 
Scale Score 


0.02 


0.00 


236.98 


<.0001 


N 


2,143 





Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 



Exhibit B.8. 

Education Agency 1, Grade 4: Logistic Regression Resuits on 
Engiish or Language Arts and Mathematics Proficiency (2007-08) 



Subject of Content 
Assessment 


Parameter 


Estimate 


Standard Error 


Wald X^(1) 


Pr>X^ 


ELA 


Intercept 


-20.38 


1.17 


303.14 


<.0001 


ELP Reading 
Scale Score 


0.04 


0.00 


281.93 


<.0001 


N 


1,754 




Math 


Intercept 


-9.54 


0.75 


160.04 


<.0001 


ELP Reading 
Scale Score 


0.02 


0.00 


167.02 


<.0001 


N 


1,766 





Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 
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Exhibit B.9. 

Education Agency 1, Grade 4: Descriptive Statistics Box Piot Anaiysis (2006-07) 



Subject of 

Content 

Assessment 


ELP Level 


Mean 


Standard 

Deviation 


Minimum 


Maximum 


First 

Quartile 


Median 


Third 

Quartile 


ELA 


Beginning 


286 


36 


216 


383 


261 


275 


310 


Early Int. 


288 


27 


220 


390 


268 


287 


304 


Intermediate 


325 


32 


237 


435 


304 


325 


346 


Early Adv. 


355 


32 


261 


458 


333 


355 


371 


Advanced 


378 


45 


288 


493 


349 


371 


403 


Math 


Beginning 


300 


56 


205 


483 


252 


292 


332 


Early Int. 


306 


46 


210 


461 


271 


303 


336 


Intermediate 


355 


55 


226 


600 


321 


351 


390 


Early Adv. 


389 


60 


248 


600 


344 


383 


421 


Advanced 


418 


58 


292 


600 


390 


412 


444 



Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 



Exhibit B.10. 

Education Agency 1, Grade 4: Descriptive Statistics Box Piot Anaiysis (2007-08) 



Subject of 

Content 

Assessment 


ELP Level 


Mean 


Standard 

Deviation 


Minimum 


Maximum 


First 

Quartile 


Median 


Third 

Quartile 


ELA 


Beginning 


273 


37 


150 


410 


258 


275 


292 


Early Int. 


291 


26 


233 


367 


274 


288 


310 


Intermediate 


322 


32 


228 


470 


298 


322 


342 


Early Adv. 


354 


33 


246 


483 


333 


355 


374 


Advanced 


370 


36 


301 


470 


345 


364 


397 


Math 


Beginning 


315 


68 


219 


600 


263 


300 


350 


Early Int. 


324 


49 


219 


522 


289 


326 


355 


Intermediate 


353 


55 


213 


600 


318 


350 


385 


Early Adv. 


389 


57 


267 


600 


350 


389 


414 


Advanced 


412 


62 


271 


600 


364 


414 


447 



Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 
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Exhibit C.3. 

Education Agency 2, Grade 7: ELP and Engiish or Language Arts 
Decision Consistency Anaiysis (2006-07) 


ELP Level 


Number of Students With ELA Scores 


Percent of Consistent 
Decisions 


Not Proficient ELA 


Proficient ELA 


Pre-functional 


154 


1 


27% 


Beginning 


291 


8 


39% 


Intermediate 


319 


63 


61% 


Advanced 


171 


242 


81% 


Fully English Proficient 


0 


30 


75% 


N (%) 


935 (73%) 


344 (27%) 




Note: Total n = 1,279 








Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Exhibit C.4. 

Education Agency 2, Grade 7: ELP and Mathematics 
Decision Consistency Anaiysis (2006-07) 


ELP Level 


Number of Students With Math Scores 


Percent of Consistent 
Decisions 


Not Proficient Math 


Proficient Math 


Pre-functional 


181 


1 


29% 


Beginning 


280 


23 


43% 


Intermediate 


283 


99 


62% 


Advanced 


184 


230 


76% 


Fully English Proficient 


5 


25 


73% 


N (%) 


933 (71%) 


378 (29%) 




Note: Total n = 1,311 








Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Appendix C 



93 



Exhibit C.5. 

Education Agency 2, Grade 7: ELP and Engiish or Language Arts 
Decision Consistency Anaiysis (2007-08) 


ELP Level 


Number of Students With ELA Scores 


Percent of Consistent 
Decisions 


Not Proficient ELA 


Proficient ELA 


Pre-functional 


109 


1 


27% 


Beginning 


283 


3 


34% 


Intermediate 


410 


59 


53% 


Advanced 


277 


300 


77% 


Fully English Proficient 


3 


35 


75% 


N (%) 


1 ,082 (73%) 


398 (27%) 




Note: Total n = 1,480 








Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Exhibit C.6. 

Education Agency 2, Grade 7: ELP and Mathematics 
Decision Consistency Anaiysis (2007-08) 


ELP Level 


Number of Students With Math Scores 


Percent of Consistent 
Decisions 


Not Proficient Math 


Proficient Math 


Pre-functional 


125 


4 


39% 


Beginning 


267 


26 


47% 


Intermediate 


335 


135 


63% 


Advanced 


198 


380 


76% 


Fully English Proficient 


1 


37 


64% 


N (%) 


926 (61%) 


582 (39%) 




Note: Total n = 1,508 








Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Exhibit C.7. 

Education Agency 2, Grade 7: Logistic Regression Resuits on 
Engiish or Language Arts and Mathematics Proficiency (2006-07) 



Subject of Content 
Assessment 


Parameter 


Estimate 


Standard Error 


Wald X^(1) 


Pr>X^ 


ELA 


Intercept 


-1 1 .08 


0.64 


299.52 


<.0001 


ELP Reading 
Scale Score 


0.01 


0.00 


277.86 


<.0001 


N 


1,288 




Math 


Intercept 


-7.61 


0.45 


291.55 


<.0001 


ELP Reading 
Scale Score 


0.01 


0.00 


255.37 


<.0001 


N 


1,321 





Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 



Exhibit C.8. 

Education Agency 2, Grade 7: Logistic Regression Resuits on 
Engiish or Language Arts and Mathematics Proficiency (2007-08) 



Subject of Content 
Assessment 


Parameter 


Estimate 


Standard Error 


Wald X^(1) 


Pr>X^ 


ELA 


Intercept 


-1 1 .77 


0.64 


335.50 


<.0001 


ELP Reading 
Scale Score 


0.02 


0.00 


303.88 


<.0001 


N 


1,484 




Math 


Intercept 


-8.56 


0.47 


330.45 


<.0001 


ELP Reading 
Scale Score 


0.01 


0.00 


312.57 


<.0001 


N 


1,512 





Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 
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Exhibit C.9. 

Education Agency 2, Grade 7: Descriptive Statistics Box Piot Anaiysis (2006-07) 



Subject of 

Content 

Assessment 


ELP Level 


Mean 


Standard 

Deviation 


Minimum 


Maximum 


First 

Quartile 


Median 


Third 

Quartile 


ELA 


Pre-functional 


319 


132 


35 


719 


256 


328 


411 


Beginning 


470 


106 


35 


832 


400 


463 


546 


Intermediate 


573 


102 


35 


819 


505 


576 


645 


Advanced 


689 


104 


35 


960 


625 


694 


761 


Fully Eng. Prof. 


820 


81 


679 


956 


783 


807 


883 


Math 


Pre-functional 


529 


59 


159 


720 


502 


531 


556 


Beginning 


580 


62 


391 


764 


540 


578 


622 


Intermediate 


631 


62 


451 


827 


592 


628 


673 


Advanced 


680 


70 


106 


901 


640 


677 


720 


Fully Eng. Prof. 


748 


89 


616 


969 


677 


738.5 


787 



Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 



Exhibit C.10. 

Education Agency 2, Grade 7: Descriptive Statistics Box Piot Anaiysis (2007-08) 



Subject of 

Content 

Assessment 


ELP Level 


Mean 


Standard 

Deviation 


Minimum 


Maximum 


First 

Quartile 


Median 


Third 

Quartile 


ELA 


Pre-functional 


209 


135 


22 


723 


89 


208.5 


284 


Beginning 


416 


125 


36 


748 


354 


419 


499 


Intermediate 


552 


108 


163 


923 


488 


542 


630 


Advanced 


690 


112 


197 


959 


613 


681 


761 


Fully Eng. Prof. 


808 


92 


619 


962 


755 


796 


867 


Math 


Pre-functional 


548 


60 


279 


775 


515 


547 


574 


Beginning 


587 


57 


437 


764 


547 


583 


622 


Intermediate 


643 


59 


489 


907 


605 


639 


681 


Advanced 


706 


66 


503 


959 


657 


703 


753 


Fully Eng. Prof. 


779 


65 


663 


907 


720 


772 


821 



Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 
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Appendix D: 
Education Agency 3 



Appendix D. Education Agency 3 



Exhibit D.1. 

Education Agency 3, Grade 10: Decision Consistency Anaiysis, 
Logistic Piot, and Box Piot (2008-09) 




Grade 10 Readme 

— ' ^ Grade 10 Math 



Grade 10, 2009 Reading Proficient Logistic Plot Grade 10, 2009 Mathematics Proficient Logistic Plot 



Estimated Estimated 





Distribution of StateContentReadingScaleScoreOS by ELP_cat08 
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continued next page 
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Exhibit D.1. (continued) 

Education Agency 3, Grade 10: Decision Consistency Analysis, 
Logistic Plot, and Box Plot (2008-09) 



Distribution of StateContentReadingScaleScoreOS by ELP_cat08 
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Distribution of StateContentMathematicsScaleSco2 by ELP_cat08 
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Note: The corresponding data tables for Exhibit D.l are Exhibits D.3 and D.4 (for the line graphs), Exhibit D.7 (for 

the logistic plots), and Exhibit D.9 (for the box plots). 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Exhibit D.2. 

Education Agency 3, Grade 10: Decision Consistency Anaiysis, 
Logistic Piot, and Box Piot (2009-10) 




Grade 10 Readme 

— —Grade 10 Math 



Grade 10, 2010 Reading Proficient Logistic Plot 



Grade 10, 2010 Mathematics Proficient Logistic Plot 



Estimated 



Estimated 





ELPCompositeScaleScore09 



ELPCompositeScaleScore09 
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Distribution of StateContentMathematicsScaleSco3 by ELP_cat09 




Note: The corresponding data tables for Exhibit D.2 are Exhibits D.5 and D.6 (for the line graphs), Exhibit D.8 (for 

the logistic plots), and Exhibit D.IO (for the box plots). 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Exhibit D.3. 

Education Agency 3, Grade 10: ELP and Reading 
Decision Consistency Anaiysis (2008-09) 


ELP Level 


Number of Students With Reading Scores 


Percent of Consistent 
Decisions 


Not Proficient Reading 


Proficient Reading 


Entering Low 


3 


0 


39% 


Entering High 


69 


0 


39% 


Beginning Low 


51 


2 


42% 


Beginning High 


128 


8 


45% 


Developing Low 


179 


12 


50% 


Developing High 


309 


47 


58% 


Expanding Low 


177 


80 


70% 


Expanding High 


242 


173 


74% 


Bridging Low 


112 


188 


77% 


Bridging High 


58 


185 


74% 


Reaching 


16 


173 


68% 


N (%) 


1,344 (61%) 


868 (39%) 




Note: Total n — 2,212 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Exhibit D.4. 

Education Agency 3, Grade 10: ELP and Mathematics 
Decision Consistency Analysis (2008-09) 


ELP Level 


Number of Students With Math Scores 


Percent of Consistent 
Decisions 


Not Proficient Math 


Proficient Math 


Entering Low 


4 


0 


39% 


Entering High 


99 


7 


39% 


Beginning Low 


65 


9 


43% 


Beginning High 


152 


19 


45% 


Developing Low 


169 


21 


51% 


Developing High 


300 


63 


57% 


Expanding Low 


167 


92 


68% 


Expanding High 


232 


182 


71% 


Bridging Low 


136 


164 


73% 


Bridging High 


66 


180 


72% 


Reaching 


30 


159 


67% 


N (%) 


1,420 (61%) 


896 (39%) 




Note: Total n — 2,316 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Exhibit D.5. 

Education Agency 3, Grade 10: ELP and Reading 
Decision Consistency Anaiysis (2009-10) 


ELP Level 


Number of Students With Reading Scores 


Percent of Consistent 
Decisions 


Not Proficient Reading 


Proficient Reading 


Entering Low 


3 


2 


41% 


Entering High 


45 


2 


41% 


Beginning Low 


62 


0 


43% 


Beginning High 


115 


6 


45% 


Deveioping Low 


132 


6 


49% 


Deveioping High 


356 


29 


54% 


Expanding Low 


264 


63 


67% 


Expanding High 


307 


203 


75% 


Bridging Low 


137 


247 


79% 


Bridging High 


62 


260 


75% 


Reaching 


17 


218 


67% 


N (%) 


1 ,500 (59%) 


1,036 (41%) 




Note: Total n — 2,536 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 



Exhibit D.6. 

Education Agency 3, Grade 10: ELP and Mathematics 
Decision Consistency Analysis (2009-10) 


ELP Level 


Number of Students With Math Scores 


Percent of Consistent 
Decisions 


Not Proficient Math 


Proficient Math 


Entering Low 


3 


2 


42% 


Entering High 


49 


2 


42% 


Beginning Low 


70 


3 


44% 


Beginning High 


119 


12 


46% 


Developing Low 


119 


22 


50% 


Developing High 


331 


53 


54% 


Expanding Low 


247 


80 


65% 


Expanding High 


302 


208 


72% 


Bridging Low 


141 


244 


75% 


Bridging High 


86 


236 


71% 


Reaching 


27 


207 


65% 


N (%) 


1 ,494 (58%) 


1 ,069 (42%) 




Note: Total n — 2,563 

Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Exhibit D.7. 

Education Agency 3, Grade 10: Logistic Regression Resuits on 
Reading and Mathematics Proficiency (2008-09) 



Subject of Content 
Assessment 


Parameter 


Estimate 


Standard Error 


Wald X^(1) 


Pr> X^ 


Reading 


Intercept 


-23.54 


1.15 


416.25 


<.0001 


ELP Composite 
Scale Score 


0.06 


0.00 


408.69 


<.0001 


N 


2,236 




Math 


Intercept 


-15.81 


0.87 


332.57 


<.0001 


ELP Composite 
Scale Score 


0.04 


0.00 


320.92 


<.0001 


N 


2,340 





Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 



Exhibit D.8. 

Education Agency 3, Grade 10: Logistic Regression Resuits on 
Reading and Mathematics Proficiency (2009-10) 



Subject of Content 
Assessment 


Parameter 


Estimate 


Standard Error 


Wald X^(1) 


Pr> X^ 


Reading 


Intercept 


-31.78 


1.38 


530.09 


<.0001 


ELP Composite 
Scale Score 


0.08 


0.00 


525.14 


<.0001 


N 


2,551 




Math 


Intercept 


-22.05 


1.06 


429.24 


<.0001 


ELP Composite 
Scale Score 


0.05 


0.00 


422.85 


<.0001 


N 


2,579 





Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 
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Exhibit D.9. 

Education Agency 3, Grade 10: Descriptive Statistics Box Piot Anaiysis (2008-09) 



Subject of 

Content 

Assessment 


ELP Level 


Mean 


Standard 

Deviation 


Minimum 


Maximum 


First 

Quartile 


Median 


Third 

Quartile 


Reading 


Entering 


393 


42 


350 


492 


350 


383 


419 


Beginning 


423 


49 


350 


548 


382 


429 


459 


Developing 


456 


42 


350 


591 


433 


462 


483 


Expanding 


489 


38 


350 


617 


469 


492 


512 


Bridging 


518 


34 


350 


610 


496 


519 


542 


Reaching 


549 


37 


442 


688 


526 


544 


570 


Math 


Entering 


480 


40 


410 


578 


457 


485 


505 


Beginning 


491 


41 


410 


580 


472 


494 


518 


Developing 


507 


37 


410 


642 


486 


509 


530 


Expanding 


531 


31 


410 


626 


514 


534 


550 


Bridging 


550 


26 


453 


643 


532 


548 


567 


Reaching 


572 


30 


493 


671 


553 


575 


589 



Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 



Exhibit D.10. 

Education Agency 3, Grade 10: Descriptive Statistics Box Piot Anaiysis (2009-10) 



Subject of 

Content 

Assessment 


ELP Level 


Mean 


Standard 

Deviation 


Minimum 


Maximum 


First 

Quartile 


Median 


Third 

Quartile 


Reading 


Entering 


395 


55 


350 


556 


350 


374 


422 


Beginning 


401 


50 


350 


617 


350 


392 


436 


Developing 


440 


46 


350 


613 


412 


445 


469 


Expanding 


483 


41 


350 


590 


459 


486 


509 


Bridging 


521 


35 


350 


606 


500 


523 


545 


Reaching 


550 


35 


350 


641 


530 


552 


571 


Math 


Entering 


473 


48 


410 


567 


410 


479 


515 


Beginning 


474 


48 


410 


586 


416 


483 


510 


Developing 


500 


43 


410 


618 


480 


507 


529 


Expanding 


525 


35 


410 


638 


509 


532 


547 


Bridging 


551 


28 


410 


639 


536 


552 


568 


Reaching 


569 


28 


483 


750 


552 


567 


584 



Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 
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Appendix E: 
Event History Analysis 



Appendix E. Event History Analysis 



The survival function (S (t)) is calculated as follow: 



S 



(o=n 

i\ti^ 



Where tlj is the number of EL students that have the potential of becoming proficient at t; minus the 
number of ELs becoming proficient and number of censored students. Censoring is discussed further 
below, dj is the number of ELs who become proficient at time tj. The product, S(/), is the overall 
probability of not becoming proficient at time / or less. In this particular of study, the focus is on the 
probability of becoming proficient at time therefore, we subtracted S(/) from 1. 

Exhibit E.l presents the number and probability of ELs identified during 2003-04 of becoming 
proficient in Grades K— 2 and Initial ELP Level 1. For example, in Year 0 for grades K— 2 with an ELP 
level 1, in order to obtain the probability of becoming proficient, the following steps were used. First, the 
probability of not becoming proficiency in Year 1 was obtained using S(/=l) or (7,728— 809)/7, 728, 
which equals 0.895. Then, the total number of students for Year 1 is determined by subtracting the 
number of student becoming proficient and the number of students censored from the total number of 
students in year 0 (7,728-809-0), which equals 6,919. The probability of not becoming proficient in year 
1 was obtained using S(/=2) or (6,919-784)/6,919, which equals 0.886. These two numbers were then 
multiplied together to obtain the estimated probability of not becoming proficient through both years 0 
and 1, which equals 0.79. In order to determine the probability of becoming proficient through year 1, 
this number (0.79) is subtracted from 1, giving rise to 0.21. 



Exhibit E.1. 

Number and Probabiiity of ELs identified During 2003-04 Becoming Proficient, 
in Grades K-2 and initiai ELP Levei 1, Education Agency 1 


Years 


Total Number of 
Students 


Number of 
Students 
Becoming 
Proficient 


Number of 
Students 
Censored 


Probability of Becoming 
Proficient 


0 


7,728 


809 


0 


0.10 


1 


6,919 


784 


464 


0.21 


2 


5,671 


430 


416 


0.27 


3 


4,825 


837 


342 


0.39 


4 


3,646 


0 


3,646 


0.39 


Source: National Evaluation of Title III Implementation student-level longitudinal achievement data sets. 
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Exhibit E.2. 

Censored Adjustment 1 — Underestimate 
Number and Probabiiity of ELs identified During 2003-04 Becoming Proficient, 
in Grades K-5, by ELP Levei, Education Agency 1 


Years 


Total 

Number of 
Students 


Number of 
Students 
Becoming 
Proficient 


Number of 
Students 
Censored 


Probability 
of Becoming 
Proficient 


Standard 

Error 


95% Confidence 
Interval 


Grades K-2 
ELP Level 1 


1 


7,728 


809 


0 


0.10 


0.00 


0.10 


0.11 


2 


6,919 


784 


464 


0.21 


0.00 


0.20 


0.22 


3 


5,671 


430 


416 


0.27 


0.01 


0.26 


0.28 


4 


4,825 


837 


342 


0.39 


0.01 


0.38 


0.41 


5 


3,646 


0 


3,646 


0.39 


0.01 


0.38 


0.41 


Grades K-2 
ELP Level 2 


1 


7,603 


2,184 


0 


0.29 


0.01 


0.28 


0.30 


2 


5,419 


1,070 


420 


0.43 


0.01 


0.42 


0.44 


3 


3,929 


382 


281 


0.48 


0.01 


0.47 


0.50 


4 


3,266 


757 


231 


0.60 


0.01 


0.59 


0.61 


5 


2,278 


0 


2,278 


0.60 


0.01 


0.59 


0.61 


Grades K-2 
ELP Level 3 


1 


10,045 


5,271 


0 


0.52 


0.01 


0.52 


0.53 


2 


4,774 


1,618 


329 


0.69 


0.00 


0.68 


0.69 


3 


2,827 


464 


226 


0.74 


0.00 


0.73 


0.75 


4 


2,137 


668 


162 


0.82 


0.00 


0.81 


0.83 


5 


1,307 


0 


1,307 


0.82 


0.00 


0.81 


0.83 


Grades 3-5 
ELP Level 1 


1 


1,043 


60 


0 


0.06 


0.01 


0.05 


0.07 


2 


983 


91 


110 


0.14 


0.01 


0.12 


0.17 


3 


782 


112 


139 


0.27 


0.01 


0.24 


0.30 


4 


531 


96 


80 


0.40 


0.02 


0.37 


0.43 


5 


355 


0 


355 


0.40 


0.02 


0.37 


0.43 


Grades 3-5 
ELP Level 2 


1 


235 


75 


0 


0.32 


0.03 


0.26 


0.38 


2 


160 


18 


23 


0.40 


0.03 


0.34 


0.46 


3 


119 


26 


29 


0.53 


0.03 


0.46 


0.60 


4 


64 


14 


15 


0.63 


0.04 


0.56 


0.70 


5 


35 


0 


35 


0.63 


0.04 


0.56 


0.70 


Grades 3-5 
ELP Level 3 


1 


335 


215 


0 


0.64 


0.03 


0.59 


0.69 


2 


120 


23 


19 


0.71 


0.02 


0.66 


0.76 


3 


78 


14 


25 


0.76 


0.02 


0.71 


0.81 


4 


39 


11 


10 


0.83 


0.02 


0.78 


0.87 


5 


18 


0 


18 


0.83 


0.02 


0.78 


0.87 


Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 
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Exhibit E.3. 

Censored Adjustment 2 — Overestimate 

Number and Probabiiity of ELs identified During 2003-04 Becoming Proficient, 
in Grades K-2, by ELP Levei, Education Agency 1 


Years 


Total 

Number of 
Students 


Number of 
Students 
Becoming 
Proficient 


Number of 
Students 
Censored 


Probability 
of Becoming 
Proficient 


Standard 

Error 


95% 

Confidence Interval 


Grades K-2 
ELP Level 1 


1 


7,728 


809 


0 


0.10 


0.00 


0.10 


0.11 


2 


6,919 


784 


0 


0.21 


0.00 


0.20 


0.22 


3 


6,135 


430 


0 


0.26 


0.01 


0.25 


0.27 


4 


5,705 


837 


0 


0.37 


0.01 


0.36 


0.38 


5 


4,868 


0 


0 


0.37 


0.01 


0.36 


0.38 


6 


4,868 


0 


0 


0.37 


0.01 


0.36 


0.38 


7 


4,868 


0 


4,868 


0.37 


0.01 


0.36 


0.38 


Grades K-2 
ELP Level 2 


1 


7,603 


2,184 


0 


0.29 


0.01 


0.28 


0.30 


2 


5,419 


1,070 


0 


0.43 


0.01 


0.42 


0.44 


3 


4,349 


382 


0 


0.48 


0.01 


0.47 


0.49 


4 


3,967 


757 


0 


0.58 


0.01 


0.57 


0.59 


5 


3,210 


0 


0 


0.58 


0.01 


0.57 


0.59 


6 


3,210 


0 


3,210 


0.58 


0.01 


0.57 


0.59 


Grades K-2 
ELP Level 3 


1 


10,045 


5,271 


0 


0.52 


0.01 


0.52 


0.53 


2 


4,774 


1,618 


0 


0.69 


0.00 


0.68 


0.69 


3 


3,156 


464 


0 


0.73 


0.00 


0.72 


0.74 


4 


2,692 


668 


0 


0.80 


0.00 


0.79 


0.81 


5 


2,024 


0 


2,024 


0.80 


0.00 


0.79 


0.81 


Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 



Appendix E 



111 



Exhibit E.4. 

Censored Adjustment 2 — Overestimate 

Number and Probabiiity of ELs identified During 2003-04 Becoming Proficient, 
in Grades 3-5, by ELP Levei, Education Agency 1 


Years 


Total 

Number of 
Students 


Number of 
Students 
Becoming 
Proficient 


Number of 
Students 
Censored 


Probabiiity 
of Becoming 
Proficient 


Standard 

Error 


95% 

Confidence Interval 


Grades K-2 
ELP Level 1 


1 


1,043 


60 


0 


0.06 


0.01 


0.05 


0.07 


2 


983 


91 


0 


0.14 


0.01 


0.12 


0.17 


3 


892 


112 


0 


0.25 


0.01 


0.23 


0.28 


4 


780 


96 


0 


0.34 


0.01 


0.32 


0.37 


5 


684 


0 


0 


0.34 


0.01 


0.32 


0.37 


6 


684 


0 


0 


0.34 


0.01 


0.32 


0.37 


7 


684 


0 


684 


0.34 


0.01 


0.32 


0.37 


Grades K-2 
ELP Level 2 


1 


235 


75 


0 


0.32 


0.03 


0.26 


0.38 


2 


160 


18 


0 


0.40 


0.03 


0.34 


0.46 


3 


142 


26 


0 


0.51 


0.03 


0.44 


0.57 


4 


116 


14 


0 


0.57 


0.03 


0.50 


0.63 


5 


102 


0 


0 


0.57 


0.03 


0.50 


0.63 


6 


102 


0 


102 


0.57 


0.03 


0.50 


0.63 


Grades K-2 
ELP Level 3 


1 


335 


215 


0 


0.64 


0.03 


0.59 


0.69 


2 


120 


23 


0 


0.71 


0.02 


0.66 


0.76 


3 


97 


14 


0 


0.75 


0.02 


0.71 


0.80 


4 


83 


11 


0 


0.79 


0.02 


0.74 


0.83 


5 


72 


0 


72 


0.79 


0.02 


0.74 


0.83 


Source: National Evaluation of Titie III Implementation student-level longitudinal achievement data sets. 
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Appendix F: 
Education Agency 1 



Appendix F. Education Agency 1 



Exhibit F.1. 

Education Agency 1, Grade 3: Descriptive Statistics Box Plot Analysis (2007-08) 



Subject of 

Content 

Assessment 


ELP Level 


Mean 


Standard 

Deviation 


Minimum 


Maximum 


First 

Quartile 


Median 


Third 

Quartile 


ELA 


Beginning 


248 


31.9 


166 


447 


229 


242 


266 


Early Int. 


265 


31.3 


156 


413 


242 


262 


285 


Intermediate 


299 


33.3 


182 


464 


278 


300 


322 


Early Adv. 


333 


34.3 


219 


487 


311 


334 


356 


Advanced 


362 


40.9 


202 


600 


334 


356 


390 


Math 


Beginning 


282 


64.4 


150 


600 


236 


273 


310 


Early Int. 


302 


56.7 


150 


600 


264 


296 


336 


Intermediate 


349 


60.9 


162 


600 


306 


346 


384 


Early Adv. 


398 


66.6 


200 


600 


352 


391 


437 


Advanced 


439 


72.3 


211 


600 


391 


437 


486 



Source: National Evaluation of Tltie III Implementation student-level longitudinal achievement data sets. 
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